Act Two: You Can't Design Web Services In Here, This Is The War Room!Over the course of the last year, as I've been creating a large web service, as I've been arguing with colleagues and managers about what's the best thing to do--an argument that started during my job interview--I found myself coming back to the mid-90s and the way the web ate other protocols. It's a data point for the question we're trying to answer: what makes the web better than something that's not the web? And a way of thinking about the next question: if we used the web a little differently, would it be better or worse?
Before I start I want to say that I'm going to focus on the pain points of the Launchpad web service, because one of the things I want to show is that our pain points exist in the spots where we violated the RESTful constraints. But overall I think it's a very good web service, and I have plans for getting rid of the pain points.
Here's the problem space. Launchpad is written in Python using the Zope framework. It tracks a huge number of objects and workflows covering all aspects of open source development: bugs, blueprints, code branches and code reviews, translations, sprints, questions and FAQs, all of this needing to be published through the web. Not all of it immediately, but we need a Launchpad developer who knows a certain subsystem, say translations, to be able to publish their objects and workflows through the web without me holding their hands and explaining REST to them.
One final requirement which came down from on high was that we also had to write a Python client for the web service, and no matter what the web service looked like, the Python client had to look like a regular ORM-type Python library. So that the end-user wouldn't have to learn anything about web services except where the network boundary is. You need to know that or you'll write an inefficient program.
I was unhappy with this requirement because it was more stuff I had to do in limited time, but I figured that it wouldn't be that hard to do if we obeyed the RESTful constraint of hypermedia as the engine of application state. Yes, that was my argument. It turns out it's easy to get people to accept highly theoretical arguments if the argument says something will be done faster.
The design of the web service is simple enough. Launchpad has an internal data model backed by a relational database and a file server. We mapped this internal data model onto an outward-facing data model that exposes resources analogous to the collections, entries, and binary files you see in AtomPub. I'll take you from the left to the right.
Zope has this interface system where you describe what a class might hypothetically look like, and then, oh, what a surprise, you've written one class that looks just like the interface. I find it annoying but it does create a separation between the implementation of a class and the way it's exposed to the rest of the application.
We built on top of this. We created some Python decorators that go into the interface class and describe how that interface should be published through a RESTful web service. They describe the delta between the internal and outward-facing models.
This is a pretty good system. I teach another developer about our annotations and the standards we've developed for applying the annotations to Zope interfaces. They go into the Zope interfaces for code branches or sprints or translations and all these things I have no clue about, and publish them through the web service.
We publish three kinds of resources, collections, entries, and binary files. These are the same kinds of resources that AtomPub publishes, and it's a pretty good bet that any kind of web service you need to publish can be modelled with these three resource types.
We served JSON representations of collections and entries, with embedded hyperlinks to allow clients to explore the object graph.
Here's the JSON representation of the service's front page. It's just a bunch of links, kind of a site map, a guide to the web service. We could do this a little better by using the JSON Referencing standard to convey the fact that there are links here; right now we use a naming convention. And needless to say the client can't manipulate this resource. It's read-only.
Here's another document, one that can actually be modified. This is part of the JSON representation of me, or at least the part of me that you see on Launchpad. It's got its fair share of links but there are also data primitives that can be modified.
I can change this document and send it back with a PUT request. Pretty much what you'd expect.
One thing you don't see a lot is this. We published hypermedia forms in WADL format. We used these forms to describe the capabilities of the resources, analagous to the way HTML forms tell you how to manipulate a website.
This is a little bit of our WADL representation of the list of people. It tells you what HTTP request you need to make to do a people search.
It might be easier if I show you an HTML form that means pretty much the same thing.
Here's the main downside with our annotation-based approach. The data model in these Zope interfaces is not the same as the data model we want to publish through the web service. We make up the difference with the annotations. Sometimes it's a simple thing like a field name that violates our naming standards. We've got an annotation that lets us specify the client-facing field name. The idea is that eventually we'll go through and rationalize the internal field name to be the same as the client-facing field name. Then we can get rid of the annotation.
That's not a big deal. But not everything is as simple as a violation of naming standards. Here's the status of a bug task in the bug tracker. 'status'. When you get the representation of a bug, it shows up as 'status' in the JSON document, alongside the bug's title and owner. You can change the bug's title and owner by changing the JSON document and putting it back with PUT. But you can't do this for status. Why not?
Well, it's not something you should have to worry about, but in Launchpad there's a method called transitionToStatus, and that's how you set the status, by calling that method. That's fine, but we didn't have time to create an annotation that says "when someone tries to modify status through PUT, call transitionToStatus instead of just changing the attribute."
So we published transitionToStatus as a custom named operation on the bugtask. You need to invoke this operation with POST. If you want to change a bug's title, owner, and status, you make a PUT or PATCH request to change title and owner, and then a POST request to change the status. It's an ugly hack.
Specifically, it's a violation of the uniform interface. There are technical terms for these ugly hacks. We said we would accept state modifications through PUT, but this one particular state modification needs to happen through POST. There are other places in the web service where we use POST this way, and most of them are uniform interface violations.
It's easy to fold this custom operation into PUT, but in other cases getting rid of the POST would require exposing new resources. And since our resources come directly from our internal data model, that's a lot of work in this system.
Another instance of the same problem. You'd think that to create a new bug you'd POST a representation of the bug to the bug list. Or to some collection, anyway.
Well, you do use POST on the bug list, but you can't send a representation of the bug. It's done as a custom named operation, the same as transitionToStatus. Again this is something we can fix with more annotations but it's work we didn't have time to do.
Well, who cares? As it happens, we have a way of measuring the effect of these hacks in real life. Remember that I also wrote a web service client, launchpadlib. Most web service client libraries are hand-hacked clients like the various interfaces to Flickr or Amazon's ECS, or they're autogenerated from descriptor files. They break whenever the web service changes and you have to rewrite them or regenerate them.
I wanted a library that works more like a web browser. You don't grab an HTML page and compile a web browser for it. You read the HTML page in at runtime and the HTML programs the web browser to display you certain data and provide you with certain options. AtomPub clients work the same way: they slurp up the descriptor file to find out what collections are available, and present you with your data and your options. I wanted to write a client based on hypermedia.
Well, launchpadlib slurps up hypermedia in the form of JSON and WADL representations of resources. It translates the hypermedia links and forms and the JSON data structures into Python idioms. It's like an ORM. It's nothing special, except that if we change the web service, the client doesn't break.
And here's one way to evaluate our web service design decisions. Warts on the web service show up as warts on the hypermedia, which show up as warts on the client code. There's nowhere to hide. So the transitionToStatus problem has a direct effect on ease of use. You think you'd be able to write code like this.
But that assumes the interface is uniform. If you try that code you get Bad Request.
In actual fact you need to learn some special code for changing the
bug status. transitionToStatus bleeds through from our internal data
model to the external data model through the hypermedia to our
client. You can see this method with
dir() on the object,
but it's just not acceptable for a Python library to behave this way.
Every time we violated one of the principles of REST for the sake of expediency, we paid a price on the client side. We either had to put in a Launchpad-specific hack, or we got some weird misfeature that will trip up end-users. The flip side of this is that by adhering to the principles of REST we got a client that's easy to use and about as flexible in the face of change as a web browser.
When we get around to adding a new kind of annotation and we stop exposing transitionToStatus through the web service, you won't have to update the client, because the client takes its marching orders from hypermedia documents served by the server. You just need to change the way you use the client.
So we've got an idiomatic native-language client and autogenerated reference documentation, all the nice tools I've been envious of in SOAP services. But because it's based on the uniform interface and hypermedia-as-the-engine-of-application-state, it's very flexible and loosely coupled.
It's not that we made no compromises. We made lots of compromises, but we did them for the sake of releasing early, and they cause exactly the kind of pain I expect to feel when I violate the RESTful constraints. We can now put in some work and get rid of the compromises and the pain without destroying these nice things we have. The constraints of REST are not bondage gear; they're a safety harness.