Justice Will Take Us Millions Of Intricate Moves

Act Three: The Maturity Heuristic

I'd say I've looked at about a hundred different web service designs. I've tried to design good web services. I want to distinguish between good web services and bad ones, and figure out what is the relationship between the constraints of REST and subjectively determined web service quality. I figured out about two-thirds of the answer while I was writing RESTful Web services. What I'm about to show you is a more complete way of looking at things.

The technologies of the World Wide Web form a technology stack for web services. When people design a web service they tend to pick some technologies from the bottom of the stack. You can judge them crudely by seeing whether they pick zero, one, two, or three technologies.

When I say you pick from the stack I don't mean that you'll ever find a web service that doesn't use HTTP at all or that has no URIs. I mean there's a class of web services that doesn't really get URIs or doesn't really get HTTP. If your REST radar is finely tuned you sense there's something wrong with these services, and you can talk about violation of the RESTful constraints, but it's kind of a bloodless way to talk. It's not clear why anyone should care.

The reason this model is useful is that these three technologies are the real-world implementation of the RESTful constraints. It's difficult to talk about hypermedia as the engine of application state but it's not difficult to talk about HTML and URIs, which embody that constraint on the World Wide Web.

If you look at an XML-RPC service, or a typical SOAP service like Google's now-deprecated search service, you'll see something that looks a lot like a C library. There are a bunch of functions, sometimes namespaced with periods. All of these functions are accessed by sending a POST request to one single URI.

Does this look like the web? Well, it looks like a part of the web most people hate: Flash-based websites for restaurants or artists or whatever. These sites are annoying because they're not really on the web. There's a little web-based peephole into some other universe, and you can only communicate with the other universe by passing messages through the peephole.

But it's pretty subjective to say that something feels like the web or isn't on the web. That's why we love Fielding's thesis that dissects the web and that's why we've developed a formal vocabulary for talking about what you get when you put HTTP, URIs, and HTML together. This is where the URI, the most revolutionary of the web technologies, shows off.

There are a lot of web services that claim to be RESTful but aren't. Flickr's web service, del.icio.us's, Amazon's ECS, etc. Rails used to be like this, but they fixed it. Amazon in particular puts out a lot of these, even though their S3 service is great.

When I first started getting into REST I just complained about these services, oh, they're not doing it right. But actually these web services are incredibly popular and useful, even though they've got serious architectural problems that make it possible to do things like destroy data by accident. So I stopped complaining and started looking at what made people, including me, like these services so much. Well, it's because they go to level one. They give each individual thing in their universe a URI. People love URIs! They make it really really easy to do mashups and cool hacks, because you pick the piece of information you want and then you do something to its URI.

If we go back to the XML-RPC example with a level one mindset we can say what's wrong with it. There's only one resource, this black box. Same deal with the Flash website. There's one resource that's too damn complicated. There's a whole restaurant in there! And we see the solution: take that complexity out of the black box and make more resources. Split the web service down to its molecular components and give each molecule its own URI.

Level Two: HTTP

When we're designing a class in a programming language we give it a large variety of methods with different names. We expect people who interact with our class to put some effort into mastering it.

We are basically creating a custom protocol for every program we write. We used to do this on the Internet too. WAIS used the Z39.50 protocol, which is this hardcore database searching protocol for libraries, managed by the Library of Congress.

Search engines are a lot better now than they were in 1997, but their interfaces are simpler then they used to be, let alone simpler than Z39.50. But there will always be inherent complexity that needs to be expressed, or else you're probably wasting your time.

The web is powerful because it gives you tools for splitting the inherent complexity of a task into small chunks. The URI lets you give a name to every object in the system. With URIs, every object can be a little bit complex. That's the URI level. On the HTTP level, the major advance of the web is that although it can handle any kind of operation, it splits out read operations, operations that want to fetch data, and treats them specially.

Here's how we do a search today. This is the entire HTTP request.

If you want to do something incredibly complicated like Z39.50 you still can! You can load up the URI with as much complexity as you want! But HTTP has factored out the fact that getting search results from Google is in some sense the same operation as getting them from WAIS. It's GET. The differences between the Google protocol and some hypothetical Z39.50-over-HTTP-like protocol are encapsulated in the URI. The similarities go into the HTTP method. GET.

It seems obvious now. But back in the 80s and 90s, "get" meant vastly different things to FTP, Gopher, WAIS, Archie, and Comp-U-Store. Dereferencing an FTP URI is this huge musical number. When all those protocols moved onto the Web, and all those pieces of information got URIs, suddenly "get" meant the same thing for all of them. The URI means "That" and the HTTP method means "Gimme."

GET is like toString(). It's this fundamental method that every object in your system supports. All it does is dump the state of the object. But for a wide variety of purposes, that's enough. Think about those old Internet protocols. Gopher, WAIS, Archie, are all about getting access to information. They were all replaced by HTTP GET. There are probably hundreds of useful one-off web services that consist entirely of HTTP resources that respond to GET.

I got tired just thinking of all the examples, but my current favorite is, the New York Times has several web services that give you access to campaign finance data, their archive of movie reviews going back to 1924, other interesting stuff.

There are also billions of standardized web services that respond to GET. Every RSS or Atom feed on every weblog is a tiny web service, an HTTP resource that responds to GET. In fact, every page on the entire web is an HTTP resource that responds to GET.

Here's where the URI part of the stack meets the HTTP part. RFC 2616 gives GET a specific meaning. GET requests have to be safe and idempotent, et cetera. The exact meaning of GET is not important. What's important is that GET has constraints on its meaning, and when there are constraints you can optimize around them: conditional GET, partial GET, reliability, cachability, and so on. This is the value of the uniform interface. Level one services don't respect these constraints.

Compare this to our whipping boy, XML-RPC. Every XML-RPC request you make uses POST. The HTTP standard places no restrictions on POST whatsoever. You might be getting data, you might be modifying data, there's no way to know. HTTP POST means "whatever!" You can't apply the optimizations on safe or idempotent requests because there's no way to know which requests fulfil those requirements.

The web we use uses only GET and POST, because those are the only methods supported by HTML 4. But most RESTful go further and split PUT and DELETE out from POST. I think you're probably familiar with this controversy.

There are debates over the value of these methods but this is a debate on level two of the maturity heuristic, not a debate about who's more pure or more practical. The argument for these methods, or for any methods, is that if we split them out of POST they start meaning something besides "whatever!" and we can optimize around them.

The downside is that when you add HTTP methods you limit the universe of clients that can understand the semantics of your service. Beyond a certain point it's better to describe the specifics of an operation with hypermedia. Which brings us to...

Level three: hypermedia

A lot of people settle for level two because hypermedia is difficult to understand and its value in the web service domain isn't as clear. But I press on. I think the Launchpad example shows the value of hypermedia in web services. It wasn't clear a priori that this would work, but it works great.

Here's a document served by Amazon S3, a file hosting service. It talks about a bucket, which is like a directory, and the files in that bucket. Unfortunately, it's not clear how you get from the directory to the files, because the document contains no hyperlinks.

S3 is a pretty good web service. They learned the lesson of URIs and the lesson of HTTP. But they neglected the lesson of HTML, which is that it's easier for your clients to move from one resource to another if you embed the actual URIs.

Instead, S3 gives you these key, and some rules you need to know to turn those names into URIs. Where are these rules? You have to read the S3 documentation. You'd never see this kind of document on the web, unless it was part of a scavenger hunt.

Putting aside the fact that it's annoying to not just have those URIs where you can use them, this design creates a coupling between the client and this particular web service. You've got to write custom client code that you can't reuse.

And Amazon doesn't get anything out of not providing URIs. In fact, they've kind of painted themselves into a corner, because their clients are coupled not to the contents of XML documents which Amazon can change easily, but to the contents of the human-readable documents that describe how to make these URIs.

By contrast here's a document served by the Netflix web service.

Here you have real hyperlinks. Now, Netflix is using a custom XML vocabulary, not HTML, not Atom or any other standard format. So clients will still be writing custom code to extract the links from a document, but they won't need special code to figure out what the links mean.

So less custom code, and also more flexibility for Netflix. If they need to change the URL to Steve Carell, they can change it and it doesn't break every client.

Hypermedia is what makes the launchpadlib client flexible in the face of server-side changes. Client behavior is programmed by the documents the web service is serving right now, not by something that was decided in the past.

So here's the lesson of HTML. Connections between resources are a form of data, and they should be described in the documents with the rest of the data. Let your clients focus on looking at that document and making decisions about what to do next. Not on internalizing your particular rules about where on the web you put your data.

It's always going to be more difficult to deal with changes when the client is an automated program rather than a human being controlling a web browser, but you can reduce the amount of custom code and cushion the shock of changes by serving real actual URIs in your representations.

There's another lesson of HTML. It's actually the same lesson, but it's not about hypertext links. It's about forms.

A link is just a URI. It says "there's a resource over here". But how do you know what to do with that resource?

One answer is media types. vCard files are for putting in your contact manager. Atom files are for subscribing to in a feed reader. HTML files are for rendering in a web browser. The media type tells you how to process the representation. The media type also tells you which parts of the representation are links. For some purposes, this is all you need to know. But when the resource has complex server-side behavior, you need hypermedia forms.

Consider two stereotypical web pages. On one hand you've got an HTML page for buying a stock. That's a fairly complicated action with multiple inputs. And on the other hand you've got an HTML page for editing a wiki. That's got different inputs. They're both HTML files. What's the difference between them?

The answer is that inside the stock purchase HTML page is an HTML FORM tag that spells out the inputs. And in the wiki editing page is a FORM tag that spells out the inputs for that. You fill out the form and you've bought stock or edited the wiki. You can tell the difference between the two by looking at semantic cues like the field labels. On the HTTP level they're both POST requests. It's the forms that give them semantics.

There are two kinds of hypermedia: links and forms. Links tell you where the resources are, and forms tell you what you can do to the resources, and the media type tells you, among other things, how to extract the links and forms from the representation.

In both cases the alternative to hypermedia is regular media: human-readable documentation that needs to be understood by a human being and hard-coded into a client. When information about the capabilities of resources is described with hypermedia, the developer doesn't have to do that work. The just have to simulate a person sitting at their web browser, looking at the web page and deciding what link to click and how to fill out the forms. This is still a difficult task, in fact it's the most difficult part of the task, but it's less difficult than simulating the person and also simulating part of the web browser.

There's nothing wrong with using regular media as a supplement. The Launchpad web service has a reference doc and a set of tutorial pages. But the reference doc was generated from the hypermedia, the same way a site map for a website is generated from the actual website. The regular media is optional. You can learn everything there is to know about the Launchpad web service by clicking around with a client that understands the media types.

URIs are the only link technology. The last competitor was Gopher descriptors, and if you've ever seen a Gopher descriptor, good riddance. You see links in HTML, XML, JSON, email messages, and plain text, but it's always embedded URIs.

But there are a whole lot of formats for hypermedia forms. Here are some of them. I think there will probably be some convergence here.

There are a lot of technologies, but the nice thing about the Web technology stack is that you can mix and match them and extend them with others. All of these form technologies can be embedded in XML.

And, like XML, the core technologies of the web are extensible. You can extend the behavior of HTTP resources by picking a uniform interface and a set of media types, the way AtomPub does. XML was designed to be extensible, and you can extend HTML with microformats. You can even extend URI to begin the assimilation process for some other protocol, though that's not relevant to web services.