[Author's note: This is the first chapter I ever wrote for RESTful Web Services, renamed and rewritten slightly to work as a standalone essay. I cut it from the book early on because it's very short, I couldn't work it into the Introduction or the first chapter, and it deals with a relatively obscure topic: why the World Wide Web beat out the other Internet protocols people were using in the mid-1990s.

The other essay in this vein is Rohit Khare's 1998 essay "Who Killed Gopher?: An Extensible Murder Mystery", name-checked below. It's a great essay, but I don't think Khare appreciated how deadly URIs were, or how HTML's hypermedia made the Web more interesting than Gopherspace. I put this essay online with an eye towards supplementing Khare's analysis and broadening it to include FTP.

My goal in this chapter/essay was to challenge an assumption implicit in much of the WS-* work: that the success of the web (HTTP+URI+HTML) doesn't teach us anything about distributed programming. In this analysis, HTTP is just a protocol that happened to become very popular. Its popularity makes it a good transport protocol for the protocols and formats that do the real work of web services, though any other protocol will work as well. But the web didn't become popular because it was a great transport protocol: we already had TCP/IP. It became popular because something about the web application made people want to stop writing specialized application protocols, and start turning their ideas into websites. RESTful Web Services argues that the same advantages are present when the application is distributed programming.]

The System of the World Wide Web

I make some pretty extravagant claims in RESTful Web Services. I claim that the basic web technologies—HTTP, HTML/XML, and URI—are, despite their simplicity, extremely powerful. "How powerful are they?", you ask, like a stand-up comedian's audience. I say they're so powerful that if you want to build a service that's run over the Internet and consumed by computer programs, you can almost always do it directly on the web, with nothing more than the technologies used to build everyday web sites for humans.

How can I back up these claims? First, how do I know that the web technologies are powerful at all? Maybe they're actually pretty lousy. Maybe people love to communicate so much that when computers got cheap enough, they took whatever they could find and built a World Wide Web out of it.

As it happened, the web faced two main Internet rivals: a predecessor (FTP), and a competitor (Gopher). Did it beat them because it was technically better, or because it got lucky or had better marketing? In this essay I compare the underlying technologies to find out what makes the web special.

Second, how do I know that web technologies are powerful enough? Obviously they're good enough for humans, since we all use the web, but maybe something else would be better for use by computer programs. Are the technologies really a good match for the problem?

I can answer this question by applying knowledge we already have. After fifteen years of the web, we know a lot about putting it to work for human beings. What are the values of the web? Which of its users' needs does it fulfill? Do computer programs, and their programmers, have similar needs? Most of the early chapters in RESTful Web Services are devoted to answering this question.

How the Web Won

The World Wide Web became the public face of the Internet around the time Internet hype started catching on among the general public: let's say mid-1995, in the months before the Netscape IPO. Up until then, both the Gopher and FTP protocols were more popular than HTTP.[0] Since then, HTTP has reigned unchallenged.[1]

This dominance was not preordained. We can imagine something like the web, except with FTP or Gopher running under the covers. Why did HTTP win?

If you've used FTP or Gopher, the question might seem silly. FTP and Gopher clients achieve the astounding task of making the Internet seem boring. FTP clients look like file managers, and they're oriented towards downloading files to disk, not displaying hypertext. Gopher clients turn hypertext into an endless "Choose Your Own Adventure" game. Why would anyone use these interfaces a second longer than neccessary? [2]

But it's unfair to compare today's technology to 1991's. The first version of HTTP was brutally minimalist, and the first web pages wouldn't have won any beauty contests either. There were no multimedia web browsers. An HTML page could link to an image file, but the IMG tag didn't exist until 1993,[3] so there was no way to display an image inline. Users had to download images separately and display them in separate windows, just as they would with an FTP client. If Gopher, or a hypertext system running atop FTP, had caught the Internet hype, it would have eventually gotten a flashy multimedia browser, and every interesting feature of HTTP.

Of course, that's getting it backwards. The Web caught the hype because it already had a flashy multimedia browser: Mosaic. And though glitz, politics, hard work, and competitors' mistakes all played a role in the success of the web, there are also aspects of the architecture that ensured the web would catch on. I think the web won because of the URI.

http://www.w3.org/

Go ahead, laugh. That's what people did when these cryptic addresses started showing up in magazine ads. And on billboards. And business cards. And television. Today, two posters hang above my work desk, and though neither has anything to do with computers or the Internet, both have URIs in small print at the bottom.[4]

URIs are everywhere, and what's vaguely funny now is the idea that they're something special. But they're very special: URI management is the fundamental consideration behind the design of web sites, web applications, and web services. Tim Berners-Lee originally intended URIs to be invisible, but they're too useful for that.

Everything has an Address

How can there be no URIs? How do you designate a file on a server without them? Well, URIs are made of a few common parts: a protocol name, a server name, and some extra bits of data understood by the server as a file path or whatever. If you can convey those three pieces of information to someone, they can come up with a set of software invocations and manual actions that gets them the file.

By the early 1990s, people were writing documents that referenced specific files on FTP sites, the way we reference pages on web sites today. But there was no concept of a file having an address. URIs were invented in 1991, but for many years not well-known outside of web circles. To "address" a file, you'd tell your human reader that the access method was anonymous FTP; mention the name of the FTP site; and then give the path on the server to the file in question. Here are some examples, gleaned at random from old documents and FAQs:

ftp.igpm.rwth-aachen.de (134.130.161.30) in:
/arc/pub/unix/motif/RenderXmString.tar.gz

the host ftp.cc.utexas.edu, in the directory pub/minerva

ftp.warwick.ac.uk in pub/cud

FTP: x2ftp.oulu.fi (130.231.48.141) Directory: /pub/cbm

This split between host and file reflects the lifetime of an FTP session. The client opens an FTP session to a particular site, logs in as "anonymous", issues a get command to grab the appropriate file, and closes the session. Each of these steps is automatable, but automating it requires a single machine-readable string describing the location of the file—an address. When there are two pieces of information to convey (host and path), nobody agrees on how to combine them.

Nowadays when people refer to a file on an FTP site, they don't mention these details. They use a URI, like ftp://ftp.warwick.ac.uk/pub/cud/. The entire address is machine-readable. You can use a URI to automatically grab a file through anonymous FTP, even though "grabbing" it requires setting up a stateful FTP session and performing a multi-step operation. So, why did no one ever write a hypertext engine atop FTP? Because the technology to refer to a specific document on an FTP site was invented along with the web.

Gopher fares better. Here's a Gopher hyperlink: one line of a Gopher "menu" document.

1All the gopher servers (that we know of) /world gopher.floodgap.com 70

It's roughly equivalent to the following HTML:[5]

<a href="gopher://gopher.floodgap.com:70/1/world" type="text/plain">
 All the gopher servers (that we know of)
</a>

Gopher imposes a standard format on its hyperlinks. It must do this to be a hypertext engine. Unfortunately, that format is confusing and not concise. It contains four pieces of information separated by whitespace, plus a textual description that's not part of the address.

If you think it's bad that pre-URI humankind couldn't agree on a way to describe a file on an FTP server, consider this: Gopher does define a standard format for hyperlinks, but my Gopher client doesn't store them that way. If I bookmark that list of gopher servers, here's how my client stores the address:

Type=1 
Name=All the gopher servers (that we know of)
Path=/world 
Host=gopher.floodgap.com Port=70

That's easier to read than the one-line hyperlink, but you can't put either format on a billboard. Even if you could, the people who read that billboard would have no way of feeding that information back into their fancy multimedia Gopher clients. Gopher is not addressable: the hostname, port, and selector are separate pieces of information. To combine them, you need a URI.[6]

Even as the web reduced Gopher to irrelevance and FTP to a bit player, it solved one of those protocols' biggest problems: a lack of addressability. URIs can address objects provided by any protocol, not just HTTP. Modern FTP and Gopher clients understand ftp:// and gopher:// URIs, automatically transforming them into a series of lower-level commands for their respective protocols.

The URI is the fundamental technical reason for the success of the web. As I show in the book, it's also also the fundamental driver behind RESTful, resource-oriented web services. A URI is just the name of a resource.

One Address Leads to Many Others

The second big technical reason behind the success of the web is the "marking-up" of content using hyperlinks. The hyperlinks themselves are important (FTP doesn't have them), but the "marking-up" aspect is what distinguishes the web from Gopher.

FTP serves two kinds of things: directory listings (links to files) and the files themselves. Gopher lets you link to other servers and customize the directory listings, but it maintains the distinction between listings and files. A Gopher "menu" (listing) contains hyperlinks, but it never contains much text. Text is supposed to go in the static documents at the other end of the hyperlinks.

A Gopher menu is a sign pointing to interesting things, not an interesting thing in itself. Modern-day Gopher users write "phlogs": documents structured like weblogs, except they're text files. No hyperlinks. (example)

The web would have worked the same way, but for HTML. Note the expansion: Hypertext Markup Language. HTML combines ("marks up") normal text with hyperlinks. Every HTML page is simultaneously a textual document and a directory of other web pages, both a destination and a stop on the way to some other URI. Today, HTML gurus try to separate content from presentation. HTML succeeded by combining content with navigation.

HTML turns the hyperlink into a rhetorical device. You can link a single word to a related page, adding another layer of meaning without breaking the flow of the sentence. FTP is a hierarchical tree, and Gopher is a graph with well-defined leaf nodes, but the web need have no leaf nodes. Any page can link to any other page, yet also be a worthy target for incoming links.[7]

You Can Make New Addresses

Make an HTTP request to a certain URI, and you'll get an HTML form. Fill out the form, make another HTTP request, and the data you submitted will be incorporated into a brand new web page. The URI to that page would have given you a "file not found" error a minute earlier. Now it's a full-fledged part of the web and you can use its URI like any other.

You're using a wiki, a weblog, a CMS, or one of the other human-oriented technologies designed to manipulate the web over the web. This is the third of the web's great advantages: the same client that fetches data from the web can also add to the store of data.

This is, in principle, possible with Gopher. The Gopher+ extension, released in July 1993, has a feature similar to the forms added to HTML a few months earlier. But Gopher forms were never used for much; certainly they were never used to create new Gopher documents. After all, a Gopher document is just a menu of hyperlinks: it's not supposed to be interesting by itself. But a web page is intrinsically interesting, and HTML (along with appropriate software on the back end) gives us the tools neccessary to create new ones to order.

I'm not gushing that "setting up a website is so easy, anyone can do it!", like some 1996 book on Information at your Fingertips. Most early websites were, and some still are, created by uploading HTML files to an FTP site. For most people in the early days of the web boom, putting up a web site was strictly more complex than putting up an equivalent FTP site. HTTP has no advantage here over Gopher or FTP.

No, what I think is important was the capability—fitfully explored in the early days but now omnipresent—to modify what's on the web using the technologies of the web. I'm gushing that setting up a website is so easy, a web client can do it.[8]

What about HTTP?

Hopefully the three reasons given above have convinced you that the web technologies are worthy of respect, that they have technical advantages over contemporaneous rival Internet services, that they're not a random set of standards forced onto the public by the net-crazed media of the 1990s.

Note one thing, though. In this analysis of why the web took over, the HTTP protocol itself is peripheral. HTTP does have a good set of features, as I show in the book, but they don't represent big conceptual breakthroughs. The Gopher protocol looks a lot like HTTP 0.9, and Gopher+ has the same ideas as HTTP 1.0. If Gopher spoke URIs and served HTML documents, it would be the web.

From the start, the web's major technical advantage was the uniform user interface of the URI. The web connected FTP, Gopher, and other early services to each other by solving one of their biggest design problems: a lack of addressibility. Then it took over the functions of those services and killed them off. All HTTP had to do was stay simple.

In retrospect, HTTP resembles the mild-mannered TCP/IP protocol. TCP/IP simply connected networks like Usenet, Bitnet, and CompuServe into a single Internet. Then the Internet swallowed those networks and killed them off.[9]

FTP, Gopher, and the other pre-HTTP services were all distributed services designed for human use. For the most part these services have been replaced by the web. I believe that programmable distributed services are also due to be swallowed by the web, that one day they will be regarded as just another kind of web site.

Much of the WS-* stack and current WS-* practice aims to co-opt the web, implementing programmable services on top of HTTP. I think this practice has it backwards. Programmable services should be implemented as special kinds of web sites, just as file stores (FTP), hypertext directories (Gopher), search engines (WAIS), and directory services (Prospero) are now implemented as web sites.


[0] RFC 1689, published in August 1994, counts 4800 Gopher servers (up from 2200 the previous year) and only 600 web servers. By November 1995, there would be over 200,000 web servers. Gopher and HTTP were invented at around the same time, in 1991.

RFC 1689 is mentioned in Rohit Khare's excellent essay "Who Killed Gopher?: An Extensible Murder Mystery"

The 1995 number comes from Tim Bray's 1996 talk, "Measuring the Web".

[1] The most popular application-level protocol invented since 1990 is BitTorrent. BitTorrent has replaced HTTP and FTP for the distribution of very large files, but no one's written a BitTorrent-based hypertext system. BitTorrent's .torrent files themselves are usually distributed through HTTP.

[2] I actually like the Gopher interface, but mainly because it reminds me of my early exposure to the Internet.

[3] That's when Marc Andreessen introduced the IMG tag into HTML, more or less by fiat. The more generic OBJECT tag didn't come along until HTML 4.01.

For an early history of HTML, see Chapter 2 of Raggett on HTML 4, available online.

[4] 2007 update: I've now supplemented those posters with one printed in 1990, which of course has no URIs on it.

[5] HTML's A tag doesn't actually have a "type" attribute, but that's what the "1" at the start of the Gopher markup means: "this link points to a plain text file."

[6] Khare almost makes this leap: "[N]o one ever plastered Gopher selectors on shirts and lunch trucks and golf tees..." Yeah, no kidding! What server is that selector for? What's the port? What's the file type? How do you label them? How does all that information fit on a golf tee? (2008 update: In personal communication, Khare says: "Come on!" And re-reading "Who Killed Gopher", he has a point. There's a section early on that calls URIs "the innovative essence of the Web". But he doesn't say what was wrong on a technical level with Gopher selectors.)

[7] The first really popular website was Yahoo!, which was originally a Gopher-style hierarchical directory. In fact, the H in Yahoo! stands for "Hierarchical". There's no contradiction here. Yahoo! was a website structured like a Gopher server, created at a time when Gopher was still popular. This is one way the web co-opts other protocols. It's happening today, with email.

[8] It's true that you can populate an FTP site with an FTP client, but again, no one ever wrote a hypertext system on top of FTP.

[9]The web/Internet analogy goes a little deeper, too: the IP address is like the URI of the Internet.


This document (source) is part of Crummy, the webspace of Leonard Richardson (contact information). It was last modified on Wednesday, July 22 2009, 21:15:12 Nowhere Standard Time and last built on Saturday, October 25 2014, 11:00:04 Nowhere Standard Time.

Crummy is © 1996-2014 Leonard Richardson. Unless otherwise noted, all text licensed under a Creative Commons License.

Document tree:

http://www.crummy.com/
writing/
RESTful-Web-Services/
system.html
Site Search: