< Previous
Mushroom Wellington >

[Comments] (6) How's My Driving? #1: Microsoft Astoria: Sumana wants me to post some analyses of web services and talk about how they are or could be made more RESTful. Alex Barnett wants me to look at Microsoft's Astoria project (so does Microsoft, apparently, since they named it after where I live). I love it when a plan comes together. Here is some free consulting for Microsoft, based entirely on my readings of white papers ("overview" and "using").

I wrote this entry by going through the two white papers and trying to discern what resources the system exposed, what parts of the uniform interface they support, what representations are served and accepted, and how resources link to each other. This is the same approach I took when analysing the Atom Publishing Protocol for RWS. I'm pretty sure I got everything right, but I'm not totally sure about the association resources. I've presented my findings in a form much like the one I'd like to see if I was trying to figure out how a RESTful web service worked. I hope this helps.

Introduction

Astoria is a framework for a certain class of web services: those that expose access to tables and rows in a database. In this regard it's similar to ActiveResource. The resources exposed might not correspond exactly to the underlying database objects, but they will look like database tables and there will be some kind of mapping to a real database.

The "overview" whitepaper starts out covering some of the ground covered in RWS chapter 11. It points out that Ajax applications have a different architecture than traditional web applications. They're GUI applications where the GUI events are handled by an invisible web service client. Astoria wants to provide the services those clients will access.

Astoria "uses URIs to point to pieces of data" (ie. resources). Astoria's "data services" are "surfaced to the web as a REST-style resource collection that is addressable with URIs and that agents can interact with using the usual HTTP verbs such as GET, POST or DELETE." So far so good. They talk the talk.

Meet the Resources

An Astoria service defines six basic kinds of resources. A lot of them correspond directly to a resource type in ActiveResource and/or the Atom Publishing Protocol. This makes sense because all three are solving similar problems: exposing collections and the objects in the collections.

  1. The "Entity-set list" resource, located at the service root (/data.svc, for example.) This is not explicitly called out, but it has a URI and responds to GET, so it's a resource. This is a simple list of the available entity-sets (see below). It's analagous to the APP service documents.
  2. "Entity-set" resources. These correspond to a table in a database. ActiveResource has these too; they're analagous to APP's collections. An entity-set is identified like so: /data.svc/Customers. Notably, this is where pagination and "order by" are supported.
  3. There are virtual entity-sets which correspond to a query on a database table: /data.svc/Customers[Valuation gteq 5000000]. This is analagous to the GData extensions to the APP that add query capability to APP collections.
  4. "Entity" resources. These correspond to rows in a database table. ActiveResource has these too, and they're analagous to APP's members.

    A entity is identified with its collection name and then some kind of unique index: maybe /data.svc/Customers[4] or /data.svc/Customers[ALFKI]

  5. There are scoped entity-sets which are associated with some entity. /data.svc/Customers[ALFKI]/Orders shows all the "order" entities associated with a particular "customer" entity. /data.svc/Territories[99999]/Employees shows all the "employee" entities associated with a particular "territory" entity.
  6. Finally, for many-to-many relationships there are separate "association" resources. /data.svc/Territories[99999]/Employees[3] shows the relationship between territory 99999 and employee 3. Presumably it's also available as /data.svc/Employees[3]/Territories[99999].

    (Note: the white papers also talk about a non-resource "association" that's part of the state of an entity. It's a link from one entity to another, as part of a relationship that's not many-to-many. For instance, every order associated with Customers[ALFKI] will contain a link in its representation to Customers[ALFKI]. These associations are not resources, because they don't have their own URIs; they only show up in representations of entities. When I say "association" I'm talking about association resources.)

The uniform interface

What do other frameworks do? The APP and Rails define GET and POST on a "collection" resource; and GET, PUT, and DELETE on a "member" resource. The APP defines GET on a service document. Where Astoria has similar resources, it exposes the same interface. The only part that's difficult to explain is the difference between POST on a regular collection and POST on a scoped collection.

The "overview" white paper has a confusing paragraph about Astoria's use of the uniform interface (the one that begins "For URIs that represent a specific entity...") which is either wrong or very poorly worded: I read it as saying that entities respond to POST, which doesn't make sense. Here's what I got from the "using" white paper, in convenient table form:
Resource GET POST PUT DELETE
Entity-set list X - - -
Entity-set (collection) X X (Create a new entity) - -
Virtual entity-set X - - -
Entity (member) X - X X
Scoped entity-set X X (Create an association between two entities) - -
Association X - X X

I'll just explain "Create an association between two entities," shall I? Here's a scoped entity-set: /data.svc/Territories[99999]/Employees. It's a collection of employees, scoped to a particular territory. There's a many-to-many relationship between territories and employees (at least according to the whitepaper). If I want to associate an existing employee with territory 99999, I POST a representation of the employee to the scoped entity-set. (The representation just contains the employee's database ID.) A new resource is created at a URI like /data.svc/Territories[99999]/Employees[3]: it's the relationship between territory 99999 and employee 3.

Representations

"Currently Astoria can represent data in plain XML, JSON (JavaScript Object Notation) and in a subset of RDF+XML." Representations are selected through content negotiation or through a query string parameter.

Incoming representations have the same format as outgoing representations.

Links

So far, so good. But now Astoria must truly pass through the RESTful crucible. How well does it use hypermedia? Pretty well, actually. The "site map" resource links to all the top-level entity-sets. Entity-sets link to entities and to appropriate entity-sets scoped to each entity. A sample XML representation of http://myserver/data.svc/Customers gives the URI to each customer in the list, and an "orders" link to each customer's orders. This is excellent, much better than ActiveResource.

This lame diagram shows my interpretation of how Astoria resources link to each other. There are two minor missing pieces. First, there's no way to get to a virtual entity-set without constructing the URI manually. This is understandable because those URIs can be about as complicated as SQL expressions. You can't design an HTML 4 form that can generate all of them. You could support a simplified version with a HTML 5 or WADL form. I'm also thinking of a series of resources that work like Bugzilla's advanced query builder to support every kind of query building. It might not be worth it.

Second, I get the impression that a scoped entity-set links to the entities inside the set, but not to the corresponding association resources. That would mean if you want to DELETE an association, you need to construct the URI yourself. This is a guess because I didn't see a representation of a scoped collection.

Miscellaneous

As with Yahoo!'s web services, Astoria's JSON representations can be wrapped in a callback function that calls your code. This lets you use the Javascript on Demand hack described in chapter 11, to run code from a foreign web service in your Ajax application. (The "using" white paper calls this JSONP, for JSON with Padding.)

The "overview" whitepaper says the pagination variables are skip and take, but the "using" whitepaper says they're skip and top. Whitepaper bug!

You can create "service operations", which are arbitrary .NET methods exposed through GET requests. The example given is a custom query, but in practice this feature will encourage a REST-RPC style of programming. However, "[i]n the future attributes will be extended so that the specific HTTP method can be controlled." And the hooks for error checking and triggers are exposed in a way I think will promote RESTful design.

Sam's favorite section: what about caching and ETags? "Astoria services also leverage other aspects such as the well-established HTTP caching infrastructure. Data services can be configured to set various caching-related HTTP headers to cache at the web server, client agent, or intermediate agents such as proxies." It sounds like caching works automatically--with a tie-in to the database engine? No mention of ETags in either white paper.

"Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document." I sure hope not.

Verdict

Astoria services are in fact RESTful and resource-oriented. They're very similar to APP services, though not so much that the APP is obviously a better base. The main thing I'd worry about is the tendency for programmers to use the hooks for writing RPC-style code. I'd head this off at the pass with the promised support for service operations triggered by other HTTP methods, some RESTful examples, and maybe a bit of theory. But I don't know the best way to herd Microsoft programmers.


Comments:

Posted by Brendan at Fri Jun 01 2007 19:20

"They're GUI applications where the GUI events are handled by an invisible web service client."

That is the single most helpful thing I've read all week.

Posted by Alex Barnett at Sat Jun 02 2007 01:25

Leonard - great to read your analysis and verdict on Astoria. Look forward to hearing Pablo's response to your free consulting :-)

Posted by Pablo Castro at Sat Jun 02 2007 16:58

Thanks for taking the time to look at Astoria in detail...free stuff (including consulting) is always nice :)

I've passed on your analysis to the rest of the folks in the team, I think it's a great read to get a different perspective on the topic.

Your description in the introduction is right on spot; I'm glad to see that after looking at the system and after parsing and re-playing the intents and goals described in the docs you landed in a view that I see as compatible with mine.

Your analysis on the kinds of resources is interesting in that you have a different view on a couple of them. Not inaccurate or anything, just different.

The one that got my attention the most is your classification of "virtual entity-sets". Some of the criticism I got on the URI format was around the fact something like "/data.svc/Customers[City eq 'London']" feels more like a query than like a resource; I spin it a bit by saying that you could think of it as a resource representing a set that has all the customers in the city of London, but spinning aside I do understand where they are coming from. Your interpretation as virtual entity-sets gives another perspective that may make them work in their current form. The alternate option for this would be to push the filter predicate part of the URI to somewhere after the "?"...not sure if at that point it is just a cosmetic issue or other things break, I haven't made my mind yet on what is the right approach.

A small accuracy note: presentation options such as "orderby" and "top" apply to any set, not only to filtered ones.

The second one that got my attention is associations. In this case I think that the main issue is that I haven't articulated this appropriately in the documentation, and maybe even in the programming interface. I see associations as a single thing that surface as a resource and also as parts of other resources. I'll think about this some more...see if something needs to change, either on how I present the concepts or how the interface surfaces associations.

Regarding representations, ATOM/APP as format/protocol is the one key missing guy here; I'm seriously considering implementing it (after figuring out an appropriate mapping of concepts), just a matter of finding the time or handing it out to one of the folks in the team.

Whitepaper bug: yep, my bad. Skip and take are the LINQ operators for paging, and I spent quite a bit of time working on LINQ before Astoria, just got the terminology mixed up.

Finally, I wanted to point out something about service-operations and REST-RPC. I had ordered your book a few days ago and got it yesterday, and I just read the first few pages were you define REST-RPC hybrids. Astoria "data aware service operations" are in my opinion a bit different than traditional RPC; the key differentiator is that the implementation does not return the results, it returns a description of what the results should be, and it does it by comprehension (by returning a query object). That means that a client-agent can still do things such as paging and sorting on top of an Astoria service operation call; furthermore, you could imagine enabling path-style navigation on top of a service operations (e.g. CustomersByState/SalesOrders?state=wa to see the sales orders of those customers in the state of Washington). It does not make it more "RESTful", but it does remove some of the over-structured nature of RPC that often times gets in the way of building generic applications and frameworks.

-pablo

Posted by Leonard at Sat Jun 02 2007 17:53

Pablo,

"Invoking a query" is not a resource, but "a list of query results" is. The difference is entirely philosophical (and the point of the philosophy is to stop you from using GET for unsafe operations). Really, you're fine, no matter what the URIs look like. You just have to be able to describe the thing the client is GETting as a noun. Every GET request runs *some* algorithm, whether it's a search or just a load from a file.

You could say that many-to-many associations are resources, and other associations are not. M2M associations have their own database tables, so they're on the same level as the objects they relate.

Re data aware operations, my main concern is that you're exposing "operations" instead of resources, functions instead of objects. The operation you describe can be seen as a resource: CustomersByState/SalesOrders?state=wa is the resource "Sales orders for customers in Washington." But users are going to use that functionality to expose "operations" like NotifyOverdueCustomers through GET. At the very least they should use overloaded POST for that.

One thing you might do here is define a "custom resource" class that programmers can subclass. It would define four methods, the way Rails controllers can define create, show, update, and destroy. If the programmer defines more methods they'd be exposed through overloaded POST. I have no idea how this would fit into WCF though.

Posted by Robert Sayre at Sun Jun 03 2007 02:44

Actually, Astoria contains a misapplication of PUT, where the meaning of the message depends on the server implementation. This is because they use a patch format as the request body for a PUT. see:

http://tech.groups.yahoo.com/group/rest-discuss/message/8474

Posted by Pablo Castro at Sun Jun 03 2007 16:20

Leonard,

Regarding operations and GET: 100% agree, we won't do GET-only operations. It's that way in the first CTP just because we didn't have time for more. The way I envision this is that when you create the operation you say which verb it is bound to; in fact, I was thinking to default to POST, because I cannot guarantee whether the user code will have side-effects, so POST is a safer choice.

Regarding representing sets of operations as customer resources or as extensions to the URI space in the data service: my take is that some applications can be modeled entirely as a set of resources, but others do need action semantics; I'd also argue that it happens quite often that at least for a subset of the app you want to write some code for validation, for taking secondary actions, etc. In some cases you can use "interceptors" in Astoria to deal with that and maintain the pure-resources feel, but in others you definitely need to deviate a bit from a resource-only world.

I elaborated a bit more on this here:
http://blogs.msdn.com/pablo/archive/2007/05/04/application-models-for-astoria.aspx

-pablo


[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.