people create web APIs in order to make some or all of their data accessible to others. in order to talk about web APIs, we need to make sure we all have an understanding of how information travels on the web. here's a very simplified version of how a website gets to your phone or computer:
to summarize, clients make requests; servers serve responses.
responses and requests both include addresses so they know where to go. when you type an address into a browser search bar, you’re typing the address of the file you want on the server where it's stored.
the response the server sends back can be a website, an image, a video. in this workshop we’ll be talking about responses that are JSON data.
APIs use different addresses to serve different parts of a data set.
let's take a close look at an API. the propublica congress API is well-documented and not evil, so we’ll use it to build as much knowledge as we can, from scratch, about APIs.
information about what an API does is referred to as its documentation. it’s the best place to get your bearings when you’re exploring a new API. this one says:
"Using the Congress API, you can retrieve legislative data from the House of Representatives, the Senate and the Library of Congress. The API, which originated at The New York Times in 2009, includes details about members, votes, bills, nominations and other aspects of congressional activity. This document describes the requests that users can make of the API and the responses that it returns."
from the description, we learn some things about APIs in general:
with this API in particular, we'd make requests to get information about:
the propublica congress API provides access to many data sets. we'll focus on the members API to keep things simple. the members API has information about how to build the address to access the data we need.
URI stands for "universal resource identifier" and it's another name for the data set's address on propublica's server.
looks like a regular web address, right? the only differences are:
for example, these three links will return different data sets because they’re (slightly) different addresses:
see the differences? the places where you have to decide the values are called parameters. programmers plug the query parameters into the address to get the response (or, data set) they need.
once you build your request address and enter it in the command line,
the response looks like this:
let's look at another API: a covert drone strike API created by data artist josh begley.
there is no documentation! sometimes, this happens with very simple APIs.
unlike the propublica congress API which has lots of different endpoints (addresses), the dronestream API just has one endpoint. this means you don’t have to deal with any parameters to build the address to get the data set you want; there's only one data set. the dronestream API also doesn’t require a key, so anyone can access it straight from the browser. give it a try:
this endpoint returns raw JSON data, just like the propublica congress API. begley created this data set by going through articles from the bureau of investigative journalism and making a JSON object for each covert drone strike launched by the u.s. before begley made the dronestream API, this data set existed in articles, but not in a format that could be accessed and used in an application or data visualization.
we've looked at two examples of APIs that journalists and activists might make and use. but many corporations also release APIs. google, facebook, amazon, uber, twitter—they may not all be profitable, but they all have APIs. why?
to encourage outside software developers to create new products and services from data these companies collect. each company decides what to make accessible to developers via its APIs, and companies can remove features or revoke access at any time, ending the viability of products and services built on those deprecated or inaccessible APIs.
companies have complete control over the data they collect and release in the form of an API. this power asymmetry can present a problem for developers.
as a case study, let's look at the uber API. here's how the company talks about its API in the API mission:
examples of prohibited uses of the uber API include aggregating uber with competitors and storing uber's data, except as expressly permitted by uber:
urbanhail was an app that aggregated the prices of rideshare options so users could choose the cheapest one. to do this, it relied on APIs, including uber’s.
uber revoked urbanhail’s API access. urbanhail’s now-defunct website informed visitors that
"Uber terminated urbanhail's API access of May 31st . We had previously been using this API access to display Uber ride options on our app's results page."
a few months after urbanhail folded, uber pricing was integrated into google maps, along with lyft and other ride-hailing platforms. uber permits price comparisons within google maps, but not for other companies.
now that we've read some examples of API documentation, have a sense of APIs do, who makes them, why we should care about who has access to them, and what data they're made from, it's only appropriate to share the official and mostly useless definition of what an API is:
the u.s. government uses this metaphor, among others, to describe them:
"APIs are like the engine of a car. You don’t have to know how it works but rather just turn the key in the ignition and it handles everything underneath."—API Resources for Federal Agencies
many platforms that offer APIs also have useful tutorials and documentation: