< Oatmeal
Dream Update >

[Comments] (1) Best-Loved Ruby Cookbook Recipes of the American People #2: "Generating graphs with Gruff": After a long wait the Ruby Cookbook is now available almost everywhere, even on BookFinder. I urge you to purchase what I hope is the strangest O'Reilly book ever published (suggestions for competitors welcome; the only one I can think of is Stephen Feuerstein's Oracle PL/SQL Programming). In addition to hundreds of tired and frankly predictable geek in-jokes (Star Trek, Rogue, Cryptonomicon, the GNU Virtual Fridge, ad nauseum), it features talking frogs, coffinfish, two-faced politicians, corpses in freezers, dispute resolution through ritual combat, Orwellian doublecode, a heart-pounding Novel on Rails, and Dr. Bronner's Peppermint Soap. Special guest star: T-Rex.

It also features graphs! Recipe 12.4, "Graphing Data", introduces Geoffery Grosenbach's Gruff library, by amazing chance also the subject of today's promotional tutorial. Gruff makes it easy to turn data structures into graphs and write them to PNG files. So the old clock on the wall says it's time for part two of the Book Sales Trilogy:

  1. Getting book information with Ruby/Amazon.
  2. Generating graphs with Gruff.
  3. Generating sparklines with the sparklines gem.

The hardest part of Gruff is installing the dang ImageMagick or RMagick libraries and their dependencies in the first place. It's easy on Debian and other systems with a good packaging system, but otherwise it can be a real pain. The second hardest part is working around Gruff when its simplifying assumptions don't apply to you. I glossed over these in the book but I'll tackle the second one a little bit in this tutorial.

Anyway, yesterday I showed you code to take periodic readings of books' Amazon sales rank. And then I showed it to you again today because the code I wrote yesterday was crap. So read that entry even if you read it earlier. The new code makes the rest of the trilogy much easier to present.

You'll recall (from like ten seconds ago when you read it) that we have a SalesReport class that encapsulates sales rank information from a book. Yesterday, though, I didn't show you anything interesting to do with this information. But the night is ours! Tonight, we graph!

Let's open up the SalesReport class again and make a sales report capable of expressing itself as a line graph:

require 'rubygems'
require 'gruff'

class SalesReport
  # Make a Gruff graph for the sales of this product.
  def make_graph(graph_path)
    g = SalesRankGraph.new(800)
    g.title = "Salesrank over time: #{name}"	
    g.theme_37signals
    g.colors = ["black"]
    g.title_font_size = 20
    g.hide_legend = true

This is mostly self-explanatory setup code. In the book I claim that most of the Gruff themes are ugly, but that theme_37signals is okay. Well, that's just, like, my opinion, man, but incontrovertible fact is -- and I should have mentioned this in the book -- that theme_37signals's idea of a good time is to graph the first dataset with a yellow line on what's basically a white background.

That's a really bad idea. I believe there's a UI maxim to the effect of: "Some other color than yellow on white, graph-reader's delight. White under yellow, dangerously confuse a fellow." So I go with theme_37signals but tell Gruff to draw the data line in black: the original high-contrast color for white backgrounds.

The other thing I do is hide the legend, because this graph is only for one product. I would like to graph sales for all of my books on a single graph, but I haven't figured out a good way to do it yet. One of Gruff's simplifying assumptions is that all your data points are spaced evenly along the X-axis starting at X=0. I'd have to insert a bunch of bogus data points for books that came out later; worse, most of my timestamps don't line up precisely, so I'd have to write code to group multiple times into a single data point. So right now I just do one book per graph.

Now we've got one line of code that's very important, because it's where I decide how the data will be represented.

    g.data(@name, collect { |date, rank| 1/rank })

The data method takes an array, and adds it to the graph as a data set. SalesReport is an array of dates and ranks, so I could just pass in the ranks, but that would yield a graph like this:

This is a lousy graph. Unimportant details (long stretches early on where no one bought the book) are the most obvious features, and you can't even see the release of the book. But the data isn't useless; it's just not presented well. We're accustomed to seeing charts go up when the numbers go up (see: any TV commercial featuring a chart), but a good sales rank is very small. Also, as all Web 2.0 types know, book sales follow a power law distribution. A book at 400K sells one copy and jumps to 200K, but you have to sell a mess of books to go from #100 to #90. Displaying the sales rank as though it were linear distorts the data.

I don't know the exact distribution for book sales, but simply taking the inverse of the sales rank gives the graph the right shape. In this graph, the release of the book is obvious, and the time leading up to it makes sense:

What about those labels on the X-axis? Where do they come from? They come from this code:


    label_hash = {}      
    [0, (self.size/2).round, self.size-1].each do |i|
      label_hash[i] = Time.at(self[i][0]).strftime('%m/%d/%Y') if self[i]
    end
    g.labels = label_hash

The graph's labels are a hash that maps positions on the X-axis to strings. Remember, the positions on the X-axis are the indices to the array(s) you passed into the data method. You don't get to choose these values. The X axis starts at zero, and ends at the maximum index of the largest array you passed into data. I choose three labels: one at the beginning, one at the end, and one halfway between. Here's a graph with a lot more history than the Ruby Cookbook one:

Finally, having created the graph, we write it to disk:


    g.write(File.join(graph_path, "#{asin}-salesrank.png"))    
  end
end

Well, not quite finally. I sneakily referenced a class called SalesRankGraph a while back, and never defined it. That class derives from Gruff::Line, but if you do this graph with a Gruff::Line it'll have weird numbers on the Y-axis:

Those labels are just what you'd think: they're the numbers being graphed. This mighty graph stretches from about zero to about 0.001. Of course, the graph is "really" measuring the inverses of those numbers, but there's no way to put that in the labels. It's another of Gruff's simplifying assumptions. You can choose your X-axis labels but not your X-axis points; you can choose your Y-axis points but not your Y-axis labels. I couldn't find an easy way to fix this, so I just hacked the draw_line_markers to not draw them. You can shut off both the X- and Y-axis labels by setting hide_line_markers, but I like the X-axis labels.

class SalesRankGraph < Gruff::Line
  def draw_line_markers
  end
end

So now I've got some pretty nice-looking graphs to track my sales rank. But I don't have time to look at graphs! I should have spent today working on my new project, but instead I wasted the morning fixing problems with the preivous entry in this series, and then spent the afternoon making pizza sauce and writing this entry! What to do? If only there were some post-literate infographic that would convey sales rank information at a glance! Something like the graphics I laboriously put on the crummy.com homepage this afternoon! Stay tuned for tomorrow's episode, The Spark of Line!

Filed under:

Comments:

Posted by Susie at Wed Jul 26 2006 19:12

There's nothing on the Y axis so I can tell how many books you sold. (yes, I actually read this entry).


[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.