Author's note: I gave this talk at RESTFest in September 2013. It was inspired by my experience writing Appendix C of RESTful Web APIs, "An API Designer's Guide to the Fielding Dissertation." The historical-fiction element came out of my desire to strip away the baggage that's accumulated around the dissertation over the past 13 years. I came up with this independently of Bret Victor's "The Future of Programming", but he definitely had the idea first.
This is a cleaned-up version of the talk as prepared for delivery. I don't think the talk I gave is significantly different.
For dramatic purposes I take some whiggish liberties with the presentation of history. In particular, I kind made of it sound like Roy Fielding single-handedly turned the 1993 Web into the 2000 Web, which is untrue. I hope you'll forgive these cheap tricks of the genre.
Stock images are taken from this mysterious, unattributed collection. The first Doctor Fun comic (September 24, 1993) appears courtesy of Doctor Fun.
“If you think the Y2K fizzle and the bursting of the dot-com bubble spelled the end of excitement on the World Wide Web, you won't believe what the future has in store! Leonard Richardson is the outspoken CTO of Outerweb, one of Fast Company's "20 Startups That Might Survive The Year 2000." In this whirlwind talk, he will use the humble origin of the World Wide Web as a window into the future, mixing historical fact with the latest research out of the University of California at Irvine. You'll see what the Web might look like in 2010—and the pitfalls that stand in our way.”
My name is Leonard Richardson. I'm the CTO of Outerweb, and I'm responsible for executing on our motto, "Beyond the Web." This is a special day for me, because one year ago, on September 21, 1999, Outerweb had its IPO. That means that my stock options mature today. As soon as I finish this talk, I'm cashing in and retiring to Bermuda. But for now, I'm keeping it professional.
2000 has been a great year for the World Wide Web. At the end of 1999, there were about 1.6 million public web servers on the Web. We are on track to nearly double that number.
But I don't want to talk to you about 2000. I want to go back in time seven years and talk about 1993.
1993 was the year the web really took off. At the end of 1992 there were about 50 web servers. Now there are over six hundred. The Web grew ten times bigger in 1993.
The people who designed the Web started getting very concerned, because they didn't really plan for 620 individual deployments. The Web was designed so particle physicists could share physics papers with their particle physicist pals. Now it's being used for all kinds of purposes it wasn't designed for.
People are sharing phone directories, newspaper articles. There's this new site called the Internet Movie Database that serves information about movies. Instead of coming up with a Movie Database Query Protocol, like you're supposed to, these Usenet people decided to put their database into HTML and serve it over HTTP as though it were a physics paper.
And it seems to work! But the physicists don't know if it will keep working. They don't know whether the Internet Movie Database is an unsustainable hack, or whether it represents the future.
And the situation is about to get a lot worse. Over the course of the next year the web is going to grow one hundred times. By the end of 1994, there will be ten thousand web servers, doing things that are increasingly far removed from the core mission of the web. Ten thousand machines serving a protocol that was designed with no attention paid to performance or error handling.
So the question is, will there be a 1995 for the Web? Is the Internet Movie Database a glimpse of our wonderful future, or is this whole thing going to crash and burn?
There is cause for pessimism. In 1993, some joker unilaterally invented an HTML tag called <IMG> which lets you embed an image in an HTML document. That's a binary image that might be one hundred kilobytes in size, downloaded automatically, without user confirmation. As a reminder, one hundred kilobytes takes seven minutes to download over a 2400-baud modem.
Pretty soon someone's going to start putting a daily comic strip on the Web. If we don't fix HTTP, the Web is going to bring down the Internet. Every day at 9 AM local time, thousands of people are going to hit the Doctor Fun web page to read the new comic and you're going to have rolling blackouts.
To fix HTTP we must decide what job HTTP should be doing. The spec says HTTP is a protocol for serving HTML documents. But thanks to the <IMG> tag, HTTP is also for serving binary images.
If we don't standardize HTML, the <IMG> tag fiasco is going to happen over and over again. Instead of inventing a new Internet protocol to do something new, like you're supposed to, people will unilaterally stick a new tag into HTML, and put their new thing on the Web.
But standardizing HTML requires deciding what the Web is for. We've got news articles, comics, information about movies... did we accidentally invent a newspaper protocol? That might be really useful in this era of the information superhighway.
We are particle physicists, not newspaper people. So we approach the problem the way particle physicists do. We build an experimental particle accelerator. We put the World Wide Web in the particle accelerator and we rip it apart so we can look at its component parts and see what makes it tick.
And to do this the physicists need to step back and let the computer scientists take over. Because we're not just curious about what's inside the web. The Web is broken. There's something missing from its design. We need to tear the Web apart in a particle accelerator, and then we need to add some extra stuff and put it back together. That's a job for a computer scientist.
One of the people who designed the particle accelerator was a CS graduate student named Roy Fielding. He spent most of the '90s working on the particle accelerator, and you can see that he got a bunch of good publications out of his work.
My talk today is a popularized summary of his Ph.D dissertation, which will be out soon if it isn't already. His dissertation explains how they broke down the Web into its component parts. It explains what they added to make the Web capable of scaling from 1993's 620 server installations to 1994's ten thousand, today's four million, and next year's eight million.
Roy Fielding is concerned with network-based hypermedia architectures. This is an infinite family of Web-like systems. It includes the crappy little Web we had in 1993. It includes the Web we have today, in 2000. It includes the Web we will have in 2010, unless civilization collapses or something. And it includes an infinite number of hypothetical Webs that might be good designs or might be terrible designs.
Fielding's job in 1993 is to map out and navigate this infinite space of Webs, find the best Web that can exist subject to real-world constraints, and to make sure it's the Web we actually have now, in 2000.
Well, what's so great about the Web in the first place? Remember, it's 1993. We have alternatives. Bill Gates wants to sell us an encyclopedia on a CD-ROM. Ted Nelson is offering us Xanadu. BBSes and Fidonet are still pretty big. The telecom companies are trying to sell interactive television. All of these might conceivably be "network-based hypermedia architectures". Why do we want the Web instead?
Roy Fielding identifies four features of the Web that are non-negotiable. This means he will not be considering every possible network-based hypermedia architecture. He will only be considering architectures with these four features.
The first is low entry-barrier. No one is making you use the Web, so it has to be an easy call to make. And even in 1993 the Web has a very low entry-barrier. It's got a browsing GUI. It's got inline images now, thanks to that damn <IMG> tag.
It's easy to write a web page, you use a text editor. You can run simple queries to search online databases. It's not very functional, but it's easy to get started. This is why we'd rather have the Web than Xanadu.
The second is extensibility. This is why we'd rather have the Web than interactive television. The web is a system that can be extended, and as early as 1993 people were extending it. The problem is that some of them were doing this with unilateral changes to the protocols such as the <IMG&g; tag.
The challenge during the '90s is to provide a baseline set of functionality that allows people to innovate without unilaterally adding stuff to the standards. And don't forget, the new Web needs to be backwards compatible with the 1993 Web that already supports six hundred applications. That's extensibility in action, as well.
The third great thing about the web is distributed hypermedia. This is why we'd rather have the Web than a CD-ROM on an encyclopedia. All the data on the Web is kept on a server and downloaded as needed. The Doctor Fun comic changes every day, so we download it again every day.
Distributed hypermedia is how the Web does extensibility. The server is in charge of the data, the links between bits of data (<A> tags), the instructions on how to present the data (<TABLE> tags), and the instructions on how to manipulate the data (<FORM> tags). This is how we get extensibility. The client doesn't make any assumptions about what it's allowed to do to the data. If the server's capabilities change, the server changes the HTML documents it serves, and the client automatically adapts.
The final great thing about the 1993 Web is Internet scale. This sounds like a buzzword meaning "really big", but it actually means that the Web scales in the same anarchic way the Internet does.
This is why we'd rather have the Web than a Bulletin Board System. You don't have to get permission to put up a web server, or link your web server to someone else's.
And unlike typical client-server software of the '90s, no one is forcing you to upgrade your software all the time. You upgrade your web server or your web browser when you need to. You can usually use an old browser against a new server, and vice versa.
So, Roy Fielding has put the Web into the particle accelerator and discovered the four non-negotiable features. But there are still an infinite number of possible architectures that have all those features. How do we decide between them?
To answer this question Fielding also lays out a bunch of properties that would be nice, on top of the non-negotiable features. He's going to judge candidates based on how much of this nice stuff they would add to 1993's Web.
I'll take a brief look at these properties, just so you can see the difference between the 1993 Web and the 2000 Web.
First, we want good performance. We want the Web to be fast, and not take down the whole Internet.
We also want the Web to feel fast. If, God forbid, a web page includes thirty 100-kilobyte inline images, you shouldn't have to download the entire three-megabyte package before you get to see anything.
And we want to use the network efficiently. If you hit the Doctor Fun webpage looking for your fix, and the comic hasn't updated yet, there's no need to download a 100-kilobyte file you already have. That's pure waste. But that's what the Web is doing now, in 1993.
Simplicity. We want the Web to have as few moving parts as possible. We're going to need to add some stuff to HTTP and HTML, but we would like to keep it as simple as possible.
Visibility. We like that HTTP and HTML are text-based protocols. Even though there's no RFC for them yet, it's really easy to figure out how the Web works and write your own server or client.
Portability. You can have a web browser on Solaris talking to a web server running on a NeXT cube. The NeXT cube serves an HTML link pointing to an Amiga. And nobody ever has to say "Hey, I'm a Solaris machine" or "I'm a Macintosh" or "Parlez-vous SGI?" Everyone talks HTTP and HTML.
Reliability. If Tim Berners-Lee's web server goes down permanently, it doesn't break the Web. It just breaks some individual web pages and some links from some other web pages.
Then there are a few I'm going to skim over because to my mind they're related to the non-negotiable properties I covered earlier. Modifiability has a lot of sub-properties but they're all related to extensibility. We like that people are using the Web way outside of its intended purpose. We want to make it possible for someone to put some crazy new thing online, like a bookstore, without having to add a bunch of nonstandard crap to HTTP and HTML.
And we want the Web to be scalable. Even in 1993 we are doing amazingly well on this score. Anyone can run a web server. You don't have to email all your web pages to Tim Berners-Lee and have him put them up on the Web. You don't have to petition your university for a "Web feed" the way you might have to petition for a full Usenet feed. The problem in 1993 is performance, not scalability.
Fielding's dissertation looks at a whole bunch of architectures, judges them by these criteria, and picks a winner. Before I show you the ultimate winner, let's look at how things stood in 1993.
Fielding calls the architecture of the 1993 Web Client-Stateless-Server, or CSS. We're now in a position to judge this architecture, and see what its problems are.
The good news is, CSS is scalable, it's simple, and it's portable. The bad news is its performance sucks. Its user-perceived performance sucks. Its network efficiency sucks. And it's too inflexible. You can't modify the system without hacking in something like the <IMG> tag.
Now let's look at the winning architecture: LCODC$SS. It's actually based on the old architecture, CSS—you can still see CSS in the name, near the end, with the dollar sign in the middle. That's good: it means that the new system can be made backwards compatible with the old system.
Almost all of the problems with CSS have been fixed. LCODC$SS has good user-perceived performance. Good network efficiency. Good modifiability.
Its network performance still sucks, but it turns out all distributed-hypermedia systems have bad network performance. If you really need high performance, you need to get rid of the network and put an encyclopedia on a CD-ROM.
The cool thing about Fielding's particle-accelerator approach is that you can build an architecture by combining smaller architectures. That's what he does here. He makes LCODC$SS by adding three atomic architectures to CSS: L, COD, and $.
L is the layered system. This is the system of HTTP proxies and gateways which we don't really notice as consumers, but which are becoming very important in scaling web applications.
And $ is Cache. This is the cache in your browser which stops you from making a new HTTP request for Doctor Fun every ten minutes.
Note that these atomic architectures aren't 100% positive. The layered system really helps scalability, but it hurts user-perceived performance, because there are more machines between the client and the server.
So, CSS plus Cache plus Layered System plus Code-On-Demand equals LCOD$CSS.
And it's important to realize that I've glossed over a lot of work here. You can't just think about Code on Demand and say "Oh, that will improve modifiability at the expense of visibility." That's not scientific. You need to run an experiment and see what happens.
And that's what Roy Fielding and a lot of other people spent the 1990s doing. Setting up experimental networks that used experimental protocols and seeing how they behaved.
And we're not quite done. The Web we have in 2000 is a LCODC$SS system, but it's not the only possible LCODC$SS system. To get to the actual Web we have today, we need to add one more architectural element.
See, it turns out you can't just stick a cache and a layered system on top of the 1993 web. You have to do some clean-up work first, because cache and layered system give you thousands of extra cases to consider. You might have a cache in the middle between client and server. You might have a client that talks to a caching proxy that goes through a load balancer before finally reaching a server. All these pieces of software are written by different teams with different interpretations of the protocols.
So if you want to implement caching and layered system on top of CSS, you have two choices. One, you can come up with a huge number of special cases to cover all possible contigencies. Or two, you can make the system as simple as possible, so that all possible contigencies look the same.
We took the second approach. Fielding describes this as a new element called U, which stands for "Uniform Interface". It's made up of four sub-elements. These four sub-elements allow us to turn the 1993 web, which uses CSS, into a Web-like system that uses LCODC$SS. And the resulting architecture is of course called LCODC$SSU, although Fielding calls it "REST" for some reason.
I'll go into some detail about the four sub-properties in a bit, but first, I thought I'd show you what it means for the Web to have a uniform interface.
From the client's point of view, a CGI script is the same as a static document. They're both identified by a URL and obtained through a HTTP GET request.
A cached HTTP response is treated the same as a response fresh from the server.
Buying a book online is the same as publishing a blog post. You get an HTML form, you fill it out, and you click a button to make an HTTP POST request. There's no special domain knowledge necessary.
And proxies, gateways, clients and servers are all interchangeable.
So, where did U come from? It's not very scientific to just say "add these four things to the system". What were the experiments?
Well, some of U came about by accident, like the discovery of penicillin. It just so happens that the 1993 web had two really useful features that were making it popular.
The first is"Identification of resources". This comes from the concept of the URL, the Universal Resource Locator. URLs didn't exist in earlier systems. URLs are amazing because they let you talk about things inside the Web with very fine granularity.
The second is "Hypermedia as the engine of application state". This comes from the fact that the server serves HTML documents that contain both data and hyperlinks. The client chooses which link to follow, and the process repeats. This is great because it means when a website changes, the site is automatically redeployed to all the clients.
So we want to keep those two really great features of the 1993 Web. But the 1993 Web also has some enormous problems. This is where we need to do the clean-up work.
The other half of U, "Self-descriptive messages" and "Manipulation of resources through representations," came out of that cleanup work.
URLs are an amazing invention, but in 1993 they have a huge problem: it's not clear what a URL is. The general consensus is that a URL identifies a "document", something like a physics paper.
But there are all these confusing edge cases. Is the preprint of a physics paper a different "document" than than the final version? Is a web page that's just a list of links to physics papers a "document"? What if instead of a curated list of links, the list of links is the output of a search engine algorithm? Is the output of an algorithm a "document"? If there was a URL that printed out random numbers, would that be one "document" or infinitely many "documents"?
Roy Fielding says, shut up about documents. Stop trying to treat a URL like a citation for a paper in the Journal of Physical Review. It's a lot more complicated than that. It's closer to the way names work in the real world. A physics paper has a name, sure. But we also have names like "The L.A. Dodgers" that identify a set of people whose membership changes over time. Individual people change drastically over time, but in the year 2010, the name "Leonard Richardson" will probably still identify me.
So the computer scientists invent a brand new concept, the resource, and the particle physicists, who are used to seeing equations like this, agree to talk about resources instead of documents. The formal definition of a resource is very abstract, but I've italicized the important part.
A resource is a set of URLs and data documents. We call the data documents "representations", because "documents" sounds too much like "physics papers".
The membership of a resource can change over time. That covers the case where a representation is updated or moves to a different URL. A resource might contain infinitely many representations but only one URL, like the random number generator.
Resources are such an abstract concept that you can only say three things about resources in general.
This gives us the second part of the uniform interface, "manipulation of resources through representations."
So that's URL's one big problem solved. We had to bring in all the complexity of real-world naming, but we solved the problem of what a URL is in the first place.
HTTP has a ton of problems, but they're practical engineering problems. For instance, you can only host one web site per IP address, because an HTTP client connects to an IP address, not a hostname. If you want another web site, you need to buy a second computer or network card. That's an expensive proposition.
Now that people are putting things other than physics papers on the Web, some URLs turn out to have side effects when you send an HTTP request to them. But which ones? There's no way to tell.
I won't go into details on the rest of the problems on this slide, because the point I want to make is that all these problems were solved the same way.
All of these problems were added by adding new pieces to the HTTP messages sent back and forth between client and server.
We added HTTP headers. We added the
Host header, which
lets the client say which hostname they're trying to access on this IP
address. This lets you host an infinite number of hostnames on a
single IP address. We added HTTP methods. Now we can distinguish
between GET methods that have no side effects, and POST requests that
might have side effects.
This is the third part of the uniform interface, "self-descriptive messages." All information necessary to understand an HTTP request or response should be contained in the request or response itself. There is no out-of-band information in HTTP.
Self-descriptive messages make it possible to add caching to
HTTP. A request that has side effects shouldn't be cached. How do we
tell if a request has side effects? Look at the method! What if the
server needs to provide special caching instructions with a response,
like "cache this for an hour" or "don't cache this"? The server
provides that information in the response itself, using
Simply chanting "self-descriptive messages!" didn't solve these problems. But it did point out the problems, and it gave a place to put the solutions. The solution is always to make the HTTP messages more explicit. There is no out-of-band communication in HTTP.
Thanks to the Fielding dissertation, instead of just saying "cookies suck," we can point out which constraint of LCODC$SSU they violate, and what the consequences are
Here's another problem that wasn't fixed. If you want to make two HTTP requests to the same web server, you either need to open up two TCP connections, or you need to wait for the first request to finish. Otherwise you won't know which response goes with which request. This also means you can't tunnel HTTP over an asynchronous protocol like UDP or email.
The problem is that nothing in an HTTP response connects it to the request that spawned it. Thanks to Fielding, we can classify the problem: it hurts user-perceived performance. And we can explain the problem: it's a failure of self-descriptive messages. Just like it was a failure of self-descriptive messages when an HTTP request didn't mention the hostname the client wanted to connect to.
Why have I devoted so much time to the history of the Web? Because history's not over! At OuterWeb we have seen the future. By 2007 the Web is going to undergo an enormous change.
We can make sense of this change by seeing it in terms of a change to LCODC$SSU. Hopefully this will help us avoid some really big pitfalls and focus on the real challenges.
The change is this: right now, in 2000, the Web is used for human-machine interaction. A web page is designed to be rendered by a computer, but a human being is supposed to look at the rendering and make their decision about which link to click.
By 2007, the Web will be primarily used for machine-to-machine interaction. A server computer will still send a web page to a client computer, but the client computer will be making the decision about which link to click. This computer will still be carrying out the wishes of a human being, but the human being will not be involved on a request-by-request basis.
This will introduce a new non-negotiable feature, to go along with low entry-barrier and extensibility and the rest: machine legibility.
"But Leonard!" you say. "HTML pages are already machine-legible! That's how web browsers work!" Well, sort of.
HTML tags are machine-legible. The <UL> tag means "everything inside me is a list". The "<IMG>" tag means "Send a GET request to this URL and incorporate the result into the already-rendered document as an inline image." You can program a computer to understand these tags, and then you have a web browser. No problem there.
Here's the problem. The data inside the HTML tags is not machine-legible. We use prose for that. "Bob lives in Minneapolis." Computers are terrible at understanding prose. But the data contained in the prose is the information you need to make an informed decision about which link to click.
Unless we have some way of moving that data into machine-legible form, we will be sitting at our computers forever, clicking one link at a time whenever we want to do something. That's why I say machine legibility is a non-negotiable feature.
At this point we get into a lot of different proposals, a lot of coordination problems, and I don't want to go into detail here, but it's probably going to involve some kind of custom XML language like the one I sketch out here.
Anything inside the angle brackets, we can program a computer to understand the semantics. We can program a computer to understand the concepts person, name, address, and city.
In this example, this computer doesn't know what "lives in" means, because that's prose. But it knows that the string "Bob" identifies a person and the string "Minneapolis" identifies the city where that person lives.
I'm brushing all that aside because I want to focus on a larger problem that no one is working on. No one has noticed it yet because the problem is only visible through the lens of the Fielding dissertation.
Look at the architecture of the Web the way I say it will be in 2007. There are now five non-negotiable features, and two of them conflict with each other!
Machine legibility drastically increases the entry-barrier, because you have to take all the work that a human being was doing implicitly by reading text, and spell it out explicitly to teach a computer to understand it.
So even if we solve the machine-legibility problem, and it's a hueg problem, we've introduced another problem. We've made the Web difficult to use. That's going to hurt adoption.
Well, maybe there are other ways to lower the entry barrier. Like, what if we got rid of the "distributed hypermedia" requirement? What if we just wrote down everything there was to know about a website ahead of time? We wouldn't need to send links back and forth all the time. A client could be programmed against the website documentation, and at any given point it would know which HTTP request to make to get what it wanted. You would have something like the API to the C standard library. A "Web API", if you will.
The problem is, distributed hypermedia is how the Web does extensibility. Get rid of hypermedia, you lose extensibility. Everyone must hard-code a client based on your documentation. And then you have to make absolutely sure to get everything right the first time, because there are no second chances. You can't re-deploy your clients. Your system is not extensible.
Okay, what if we could force everyone to upgrade their clients whenever the server side changes? Then we could get extensibility back.
Problem is, on the Web, distributed hypermedia is also how we force everyone to upgrade at the same time. When you redesign your website, it's automatically redeployed to all your users. Without distributed hypermedia, you're talking about strong-arm tactics. Upgrading everyone's web browsers whenever the web site changes.
This strategy might work in an organization with an IT department that can coordinate a big change like this. But it won't work on the public Internet. So if you give up you lose Internet scale.
So it looks like you need to give up hypermedia to keep the entry-barrier low. But if you give up hypermedia, you also lose either extensibility or Internet-scale. So it looks like you can only have three of these five essential features, right?
Well, no, you can have four. It's just the the one you can't have is "Low entry-barrier." But here's the good news: "Low entry-barrier" is subjective! The 1993 Web had low entry-barrier relative to other, similar systems. In some respects, the 2000 Web has a lower entry-barrier than the 1993 Web; in other respects, it's higher.
Machine legibility is a really hard project. But if we do it right, the entry-barrier will be lower than if we do it wrong. By keeping distributed hypermedia in mind when designing machine legibility, we can keep extensibility and Internet scale, and we can bring down the entry-barrier relative to what it would be otherwise.
Or, we could just give up! Give up on hypermedia and extensibility as well. But if we do that, we'll have a system where changes are incredibly slow and painful. A system that will never be nearly as popular as the Web. A system where the entry-barrier is low because there's not much there.
This is my challenge to you over the next seven to ten years: let's work on achieving machine legibility, in a way that keeps the entry-barrier as low as possible, without sacrificing distributed hypermedia.
This document (source) is part of Crummy, the webspace of Leonard Richardson (contact information). It was last modified on Tuesday, September 24 2013, 19:29:39 Nowhere Standard Time and last built on Friday, March 24 2023, 03:00:19 Nowhere Standard Time.