News You Can Bruise for 2012 February

(3) Thu Feb 02 2012 11:59 easy_install beautifulsoup4: This is an HTMLized version of an email I sent to the Beautiful Soup discussion group, about the impending release of Beautiful Soup 4.

Introduction

When Beautiful Soup was first released in 2004, the state of HTML parsing in Python was appalling. Over the past eight years, things have improved so dramatically that Beautiful Soup's HTML parser is no longer a competitive advantage. I don't want to duplicate other peoples', work, so I'm getting Beautiful Soup out of the parser businesss. Beautiful Soup's job is now to provide a Pythonic screen-scraping API on top of a data structure created by a third-party parser.

This will be Beautiful Soup 4, and I've been planning it for years. With help from Thomas Kluyver and Ezio Melotti, I've now met the three main goals of Beautiful Soup 4:

Make a single codebase that works under Python 2 and Python 3.
Stop using SGMLParser (removed in Python 3) and make it possible to swap out one parser for another.
Support two major Python parsers (lxml and html5lib) as well as Python's (not currently very good) batteries-included parser, html.parser.

The first version of BS4 is almost ready for release, and I'd like you to test it out, if you haven't already. I still to fix some things, in particular some performance problems. But, note that even with the performance problems, BS4 is faster than BS3 across the board.

On Python 2 or Python 3 you can install the BS4 beta with this command:

easy_install beautifulsoup4

You can also get the source tarball.

The documentation has been completely rewritten. You may find the section on porting BS3 code to BS4 especially interesting.

There are three major things I'd like your feedback on before completing the release.

Hall of Fame

The BS3 documentation lists open-source projects that use Beautiful Soup. I stopped maintaining this list many years ago because there are hundreds of these projects, and since most of them are screen-scrapers, they're pretty ephemeral.

I'd like to bring this feature back as a "hall of fame", featuring applications of Beautiful Soup that grab a reader's attention. People who used Beautiful Soup in a high-profile way or to tackle a big issue. Projects that are interesting to hear about even if the software doesn't work anymore, or uses an old version of Beautiful Soup, or if Beautiful Soup was used internally and the public only saw the results.

My bias is towards projects having to do with space, science, journalism, politics and social justice. Here are some examples so you know the kind of thing I'm thinking of:

"Movable Type", a work of digital art on display in the lobby of the New York Times building, uses Beautiful Soup to scrape New York Times feeds.
Alexander Harrowell uses Beautiful Soup to track the business activities of an arms merchant.
The Lawrence Journal-World used Beautiful Soup in 2006 and 2010 to gather election results.
The NOAA's Forecast Applications Branch uses Beautiful Soup in TopoGrabber, a script for downloading "high resolution USGS datasets."

If you did anything of this sort, or know of someone who did, I'd like to hear about it.

Do you prefer lxml or html5lib?

Right now, the parser ranking goes lxml, html5lib, html.parser. I like lxml because it's incredibly fast and it can parse anything. But I'd like to see what you think of the trees it generates. Would html5lib, with its web-browser-like heuristics, be a better default?

substitute_html_entities

BS3 had a number of overlapping and inconsistent ways of turning HTML/XML entities into Unicode characters, and possibly turning Microsoft smart quotes into HTML entities at the same time. In BS4, all this stuff is gone. HTML and XML entities are *always* converted into Unicode characters.

This is great but there's one problem: output. If you want to turn those Unicode characters back into entities when outputting as a string, you need to call soup.encode(substitute_html_entities=True), which is a little clunky. I'm thinking of adding an output_html_entities attribute that you can set on a soup or tag to control whether this substitution happens. Do you like this idea?

I think I also need to ensure that characters like "&" and "always converted to XML entities on output, even though this will hurt performance a bit.

Conclusion

What you install with easy_install beautifulsoup4 is a beta release. If I hear of a problem soon, there's still time to fix it, even if it means a major change to the API. So please try it out and give me feedback.

Mon Feb 06 2012 17:55: Earlier I ran some speed/accuracy tests of Beautiful Soup driven by various parsers. Python's built-in HTMLParser scored very poorly, parsing only 52% (Python 2.7.1) or 57% (3.2.2) of my test pages without raising an exception. Well, Ezio Melotti, the maintainer of HTMLParser, has been working for a while on improving HTMLParser's handling of bad HTML. Most of this code is in Python 3.2.2, so I should have been getting the benefit, but it wasn't working for me because of a semi-related bug in HTMLParser, which is fixed in the as-yet-unreleased 3.2.3.

After talking with Ezio today, I was able to monkeypatch BS4 to avoid the bug in 3.2.2. This means on Python 3, BS4 with no external parser installed will give reliability comparable to BS4+lxml (98% versus 99%). It's still about 50% slower, though, parsing about 1300 kb of HTML per second, versus 2100 kb/second for BS4+lxml.

(4) Tue Feb 07 2012 08:48 Constellation Games Author Commentary #11: "Launch Title": Love those title puns! This blockbuster episode sends Ariel TO THE MOON and introduces two major new characters, Tetsuo Milk and Ashley Somn. Also a minor but important character: Linda Blum, Ariel's mom.

Here's last week's Twitter archive, which ran two weeks ago due to my own errors. Twitter service has now resumed, but because this plot arc is so compressed (the rest of Part One crams two weeks of frantic activity into five weeks of real time), most of it is going to come out on Tuesdays and Wednesdays. Don't be afraid, I'll be here the whole time with long-winded commentary:

It's such a relief to be able to talk about Tetsuo! So much happens in this chapter, he doesn't get a lot of time with Ariel, but that changes starting next week. Tetsuo is great, I love him a lot, but... he's a scene stealer. Anything I wrote, he would grab and run off in some weird direction. When the Aliens were choosing human names, Tetsuo is the guy who picked a name because it means "iron man."
Tetsuo reminds me of Londo Mollari from Babylon 5, in that he starts out a comic relief character (insofar as a comedy can have designated comic relief) and over time reveals more serious facets of his personality. But unlike Londo, Tetsuo never stops saying goofy shit. Tetsuo is the infrafictional author of the subscriber bonus tome "Pey Shkoy Benefits Humans", which is set about six months after the end of the novel, and he's still at it.
Ashley Somn is not a scene-stealer, so her husband kind of overshadows her for most of the book. But she's an awesome character on a slow burn. She's got a high-drama character arc that revs up in the last third, and which I fill in with the bonus story "The Time Somn Died." (Title is not a spoiler.)
I keep forgetting that Tetsuo is orange and Ashley is green; I always imagine it the other way around. They're bright neon colors with darker spots, like tropical frogs. Why? Are Aliens poisonous? I dunno. Lick one and see!
The short scene with BEA Agent Krakowski in back of the strip club is the very last thing I wrote for Constellation Games. Its main purpose is to dramatize the sub rosa assignment Krakowski gives Ariel, an assignment which becomes very important in Part Two. But I also threw in the talking rat, to introduce you to something else that's very important in Part Two: the idea that Ariel might not be the most reliable narrator.
I don't know why Krakowski and Fowler were at a strip club in the middle of the day, but I'm sure it was work-related.
The part of this chapter that's an excerpt from Ariel's Twitter feed will not be shown on his actual Twitter feed, because that would be annoying. But that section was the main inspiration for the in-character feeds in the first place.
Original title for this chapter: "MAN WALKS ON FUCKING MOON."
One of the big problems with the second draft was that for most of it, the tone was emotionally distant. It took me a while to understand the characters. The blog format didn't help, and I moved further away from it in every draft. Ariel is a guy who shows vulnerability in huge dramatic bursts and won't open up otherwise. Etc. etc. Anyway, I worked on this a lot in the third draft, and one of the big changes is that at Andrew Willett's suggestion I modeled Ariel's lunar excursion on this Narbonic strip.
Ariel's reaction to being launched into space is taken directly from what I imagine would happen to me. Also possibly taken from my reaction as a kid to the Disneyland exhibit "Mission to Mars", which had a similar setup where you saw the ground drop out from underneath you while feeling absolutely no acceleration.
The sculptures of the figures from the Pioneer plaque (not "Carl Sagan's gold record", as Ariel mistakenly believes—that's the Voyager record) are another moment of not-quite-understanding taken from "Vanilla". Ariel's initial description of the docking bay is the opening parenthesis of a pretty huge piece of bookending, so watch for that.
One of the imaginary book covers I had in my head while writing was a design based on the Pioneer plaque, except with (clothed) Ariel and Jenny.
Ariel mentions some Eritrean refugees living on Ring City, but they're only the biggest and most famous group of refugees—the ones Ariel knows about. Human Ring is also home to smaller groups of refugees from around the world, and to miscellaneous individuals living under the radar. The actual population of Human Ring at this point is closer to 700 than to 500.
The refugees come up a few times later on, but they don't play a big part in the novel because I don't currently feel I've got the literary chops to tell their stories. But I knew that not mentioning refugees would be unrealistic. It would imply that humanity's governments were able to coordinate to completely lock down the planet, or that the Constellation was sending away asylum seekers. What we have is a compromise, not one I'm happy with, but I think the best I could do.
I know this is already super long, but I want to introduce a new segment here on CG Author Commentary, a little recurring bit I like to call Creative License. Sure, I write silly stories about space aliens visiting Earth seemingly in violation of the Fermi Paradox, but that doesn't mean I can just make stuff up. At the same time, I want Constellation technology to appear very close to the "indistinguishable from magic" line. Creative License explores that tension by pointing out things that probably can't exist in real life, and the made-up reasons I use to justify their existence in the story.
First we have the shuttles the Constellation uses for short hops to Earth and Luna. I have no idea how they go as fast as they do, but I have a vague idea how they achieve a smooth ride: they use ports to maintain an acceleration differential between the inside of the shuttle and the outside, so the inside accelerates at a gravity-like rate while the outside powers up towards some horrendous speed. Ports are very important bits of Constellation tech and need their own segment on Creative License. I'll probably do them next week, after we see one in action.
But this week we also have the Constellation spacesuits. Inflatable spacesuits are nothing new, but Ariel's suit folds up when not inflated and doesn't seem to have any space for hard parts like air tanks, a fluid recycler, the comm system he plugs his phone into, or a way of dissipating heat. Creative License Solution: as you'll see later, the Constellation does pretty amazing things with origami. I imagine all that fine machinery is packed flat and inflates to the correct shape with the rest of the suit.

What a huge commentary, and this plot arc's just getting started. Be sure to tune in next week, when Ariel will say, "I do not use sex to maintain social cohesion."

Image credits: Andy Bernay, Joe Mabel, Linda Salzman Sagan, Harold W. McCauley.

<- Last week | Next week ->

Wed Feb 08 2012 11:02 Beautiful Soup 4 Beta 4: Beautiful Soup 4 beta 4 is out! You can install it with easy_install beautifulsoup4 or pip install beautifulsoup4. You can also download the tarball or check out the Bazaar repository.

Big changes:

If you're using Python 3.2, the built-in html.parser is now reliable enough to use on its own. You don't need to install lxml or html5lib just to parse bad HTML (but lxml is still a lot faster). The forthcoming Python 2.7.3 should also work this way.
This is of course a feature of Python, but due to a pretty bad bug in html.parser, I wasn't taking advantage of it. I worked with Ezio Melotti to monkeypatch that bug from within BS, and now we're back in the very good situation of not needing any external dependencies.
new_tag() will follow the rules of whatever tree builder was used to create the original soup. For example, a new <p> tag will look like "<p />" if you're dealing with XML, but it'll look like "<p></p>" if you're dealing with HTML.
There's now a new_string() method to go along with new_tag().
There are two new methods for manipulating the tree: PageElement.insert_before() and PageElement.insert_after().
I replaced the substitute_html_entities argument with the more general formatter argument. You can do all sorts of crazy stuff with this.
The default formatter converts bare ampersands and angle brackets to XML entities, but doesn't touch HTML entities. I think it's kind of America-centric to convert characters like é to é by default, but I might make the default a "punctuation" formatter that converts things like curly quotes to HTML entities.

Thu Feb 09 2012 17:11 Beautiful Soup 4 Beta 5: Just going to link to my description message this time. Today I focused on clearing out the bug backlog. It's mostly minor stuff, but I'd like opinions on one change, relating to how a tag is treated if it has multiple CSS classes.

Mon Feb 13 2012 09:13 nanDECK: I have a little side project creating a print-and-play board game. The game has a lot of cards, but I don't need to design each card individually--I can generate them programatically. Or I could, if I were capable of writing the program.

First I tried ReportLab, the Python library for making PDFs. I'd used it for the sadly-now-defunct Pocket Wisherman, and I thought it would be perfect for putting lots of little squares on a piece of paper.

Not so fast! The Pocket Wisherman puts lots of squares on a piece of paper, but in that program text flows from one square to another. That can't happen on a playing card. The closest I could come with ReportLab was a table, and since I couldn't add spacing between the table cells the way you can in... HTML...

nanDECK screenshot It was easy to get something in HTML that looked right on screen (these cards are pretty simple), but not so easy to get them to look good when printed. So I went back to searching for tools optimized for card design. I delved deep, past many people talking about the best way to manufacture cards for print-and-play-games, and then I found nanDECK by Andrea Nini.

I'm gonna complain a lot about nanDECK so I want to make it really clear that nanDECK solved my problem. In about an hour I went from having two failed Python scripts and no cards, to having cards as nice as my design skils could make them. If I got some design help from someone else I can make the cards nicer still, from within nanDECK.

Now, let the complaining begin! Actually, I'm not even gonna complain. I'll just phrase my complaints as helpful hints. nanDECK is a Windows IDE for a domain-specific markup/programming language. It runs fine in WINE. The prominently-linked manual is actually a reference guide--tutorials and examples are linked further down the homepage.

The interface features so many buttons that the "visual edit" button might get lost in the shuffle (ha), but that button is going to help you so much. You won't have to remember all the arguments to the language directives, and you can lay out elements visually on the card rather than guess at measurements over and over again. In the end I couldn't get the linked-data feature to work (possibly an interaction with WINE), so I figured out the layout for a single card within nanDECK and then wrote a Python program to generate the nanDECK script for my entire deck.

Whew! Kept it positive. If you want to design cards for a game, and you don't want to lay them all out manually (which you shouldn't), I think nanDECK is your best option. Thanks, Andrea Nini!

Tue Feb 14 2012 09:00 Constellation Games Author Commentary #12: "Monsters From Space": Welcome to another chapter full of laughter and embarrassing faux pas. This week we learn why Curic scanned Ariel's house, and get our first glimpses of the ancient, not-particularly-wise Ip Shkoy.

Before the commentary begins, I want to bring up something serious that I could save for next week but I don't want to. Dr. Janice Voss died on February 6 at 55. She was a scientist, a NASA astronaut who flew on five shuttle missions, and later the science director for the Kepler Space Telescope. She was a big science fiction fan. I met her once in 2007, in what was certainly the highest-wattage dinner I've ever attended (photos), and she made a huge impression on me.

The only major character in Constellation Games you haven't met yet is an astronaut, Tammy Miram. She's introduced next week. If I hadn't met Janice Voss, Tammy Miram would not exist, and I have no idea what the novel would look like from next week on.

I don't mean that Tammy Miram is "based on" Janice Voss, or that the character is a way to tell Janice's story in a fictional setting. I only met Janice Voss once and I have no idea what her story would look like. (Spoiler) Also, Janice was a very well-adjusted person, and Tammy is not. But a dinner-length conversation with Janice was enough to move the societal role of "NASA astronaut" out of my mental category "archetypes useful in science fiction stories" and into "interesting jobs I can give to my characters."

R.I.P., Janice Voss. Ad astra per aspera.

Here's last week's Twitter feed, as it was meant to be seen (i.e. without a weird UTF-8 encoding issue). And now, this week's commentary:

I've never been happy with this chapter title. Any other suggestions? It's too late to change it, but I'd like to hear what you think.
When I first discused cover art with Kate, I was apprehensive that she would insist on a Big Dumb Object In Space cover. A shot of Ring City, or the hole in the moon. The sort of cover that presumably moves books, since every single science fiction paperback has it, but one that I think would be entirely out of place in a novel about middle-class people from Austin. Fortunately, Kate, like me, wanted a cover that implied "video games", and we settled on the handheld computer (about which more in a couple weeks). But I had an ace up my sleeve—a Big Dumb Object cover that I would have liked.
This hypothetical cover is the interior of Alien Ring, huge and breathtaking, the cma forest curling up in the distance along with the curvature of the ring, and Ariel in the foreground taking a picture of it on his cameraphone. It didn't happen, but I could have lived with it.
Actually, "Big Dumb Object In Space" would have been a better chapter title.
The Alien Ring stuff got seriously moved around. In the second draft, Tetsuo and Ashley met Ariel in the docking bay, took him to Alien Ring, they met Curic there, and everyone went to the moon together. All of this happened in chapter 11, after Ariel's initial spaceflight. It was way too much for one chapter, so Alien Ring got pushed to chapters 12 and 13, and expanded greatly.
My Earth-life analogues for the Aliens were always bonobo chimps, notorious among humans for their use of sex to maintain social cohesion. But in "Vanilla" it was more in the background. The primary Alien character, George, was pretty buttoned-down and never had a scene with another Alien. For Constellation Games I went all-out and made the Aliens huge sluts. Good decision!
In case you're curious, the Earth-life analogue for the Farang is the deadly Snowth.
Curic's name change (which never comes up again) is a fun detail, the kind of thing I wish more science fiction stories would mention, but it's also SYMBOLISM. By the end of the book, all the major characters except Ariel have had two different names or identities. So far we've seen Tetsuo and Ashley taking human names, Bai going by his surname, and Curic being Curic.
What does it mean? Nothing—it's free-floating symbolism. Just kidding, I do have an opinion on what it means, but it'll need to wait for the end of the book.
This week's "Finux" moment: the Bit Boy series in this novel is a transparent stand-in for the real-world Mega Man series.
Finally, Creative License returns with an in-depth discussion of ports, first seen in this chapter connecting the lunar excavation to the Ring City habitats above.
A port is the two ends of an exotic-matter wormhole with negative mass. Each end of the wormhole is mounted in a positive-mass case, and you can (let's say) carry one end down to the moon to shorten the spacetime distance between the space station and the moon. Ports can be collapsed from either end by destabilizing the wormhole.
I invented ports in 2006 for "Vanilla" and in that story I did a lot of work showing what you could do with them. I felt writers had generally treated wormholes as magic gateways and neglected their mayhemic possibilities. I mean, just imagine if the two ends of a wormhole could be moved independently! You could set up all sorts of wacky gravity and pressure differentials.
Then in 2007 Portal came out. So, I give up. Ports in the Constellation universe work just like in Portal, with two differences. First, you can't shoot wormholes out of guns, because a) it takes an enormous amount of energy to make one, and b) a wormhole has two sides. In Portal terms, the "blue" portal has no existence without the "orange" portal. Second, in Portal, gravity always points down. In the Constellation universe, gravity travels through ports. By proper placement of ports you can create localized weightlessness or antigravity effects.
Anyway, the whole thing is moot, because stable wormholes of this sort almost certainly can't exist—they'd violate causality and allow for time travel. The whole thing is merely... Creative License.

Tune in next week, when Curic will say, "Infiltration? Cold reading? Propaganda? Torture? Extracting false confessions?"

Image credits: NASA, NASA again, Kabir Bakie, Alain r.

<- Last week | Next week ->

Thu Feb 16 2012 09:16 Beautiful Soup 4 Beta 6, Beautiful Soup 3.2.1: There are two ongoing serials here at crummy.com: Constellation Games and Beautiful Soup 4. Here's the announcement message for the latest installment in the latter saga.

The big news is a new release of the 3.x series, Beautiful Soup 3.2.1. This fixes a pretty bad problem that can let through cross-site scripting attacks if you use Beautiful Soup to sanitize HTML. If that's you, you should upgrade ASAP.

That was certainly worth fixing, but I don't do much work on Beautiful Soup 3 anymore. I mean, if I fixed every bug in BS3, I'd have... Beautiful Soup 4, which is now almost done. All the bugs are closed out. There's one more big feature I may add, and some minor cleanup I want to do, but mainly I want to make sure people are comfortable with the new API.

Thanks to Stefano Rivera, BS4 is now in Debian unstable and Ubuntu Pangolin, as beautifulsoup4. So the clock is ticking on freezing the API. This would be a great time to try to port your BS3 scripts to BS4, and let me know how difficult it was and what you had to change.

(2) Mon Feb 20 2012 07:32 Where's That Golden Age?: A couple weeks ago Samuel Arbesman posted an entry to Wired's science blog called "How to search for the golden age of television", an entry that's been driving me crazy since I read it. Not because I disagree with his analysis of the IMDB dataset, but because I don't like his starting point. Arbesman uses "each television show’s running time, in number of episodes, as a very rough proxy for quality". It's true that there's probably a positive correlation, but that metric has a couple problems. First, it severely discounts the present. A show on the air today may have several seasons to run, but we don't know that yet, so it'll look worse than an old show of equivalent quality. Second, the IMDB dataset features a much more direct proxy for quality: user ratings.

I don't think ratings are a great proxy for quality--a look at the highest-rated TV shows will put a stop to that nonsense. And the run length of a show is at least an objective fact. But I think our collective opinion of a TV show today is a better proxy of quality than how long the network was originally willing to keep it going. And if you use ratings, I think you can get closer to answering the question "what would a golden age of television look like?"

My guess is, Arbesman didn't use ratings because it's kind of annoying to get that information out of the IMDB dataset. But I'd already done a lot of work on the dataset for The MST3K-IMDB Effect, so in this post I crunch the numbers my way and see what falls out.

If you're expecting controversy, I can't provide. My findings don't contradict Arbesman's, they just provide a different way of looking at the data.

Step 1: Get the data

(If you're impatient, you can skip to the graphs.)

It all starts with IMDB's plain-text data dumps. I downloaded release-dates.list.gz and ratings.list.gz from the FTP site. I also downloaded distributors.list.gz, but it turned out that data wasn't useful.

Step 2: Identify shows, episodes, and air dates

release-dates.list lists all movies, TV shows, and episodes of TV shows. TV shows are in quotes, and episode names are in curly brackets.

Point Break (1991)					USA:12 July 1991
"Star Trek: Voyager" (1995)				USA:16 January 1995
"Star Trek: Voyager" (1995) {Caretaker (#1.1)}		USA:16 January 1995

Unfortunately, web series look just like TV shows, which is going to mess with the data for recent years:

"The Angry Video Game Nerd" (2006) {A Nightmare on Elm Street (#1.13)}	USA:31 October 2006

I tried some tricks to get rid of web series, like only considering shows with a listed television distributor (distributors.list), but there are tons of dinky cable reality shows that have exactly the same data characteristics as web series. So I'm leaving them in. Just know that when I say "TV shows", I'm talking about TV shows + web series.

To make the initial dataset smaller, I used grep to remove everything except the US premieres of TV shows, and of episodes of TV shows. (And web series.) Then I wrote a Python script that turns this information into a picklable data structure.

The script ties a show to all of its known episodes, and parses out each episode's release date along with the premiere date of the show itself. I want to know every year in which an episode of the show premiered in the US. This has some problems--it makes the original "Star Trek" show up as a 1988 show because that's the first time the original pilot was aired--but they're pretty minor.

Step 3: Add the ratings

Now I know when every show started, and in many cases I know every year a show was on the air. In the next step I load in another file and add ratings to shows and episodes.

Ratings are kept in ratings.list. They look like this:

      0000001212   11245   7.5  "Star Trek: Voyager" (1995)
      0000012111    1558   7.1  "Star Trek: Voyager" (1995) {Caretaker (#1.1)}

There's lots of cool stuff here like a histogram (0000012111 means 10% of people rated the premiere of Voyager a 6, 20% of people a 7, and so on), but what we're after are the IMDB ranks: 7.5 stars and 7.1 stars in this case.

Unfortunately, there's a lot of boring stuff in ratings.list like the top 250 movies. Fortunately, I already wrote code to parse this file during my investigations into the MST3K-IMDB effect.

Step 4: Graphs!

Now I'm going to break out numpy and pychart. Let me start with a calibration run, a graph Arbesman also did. How many shows were on the air in a given year?

Shows over time

Pretty similar to Arbesman's graph. My graph doesn't go down at the end, because I cut the data off at 2011, the last full year of data. I also start later, with the first year for which there were five rated TV shows. I'm picking up some shows he's not, possibly because I'm counting a show in every year it aired, possibly because I'm picking up shows that don't have any episodes listed on IMDB, possibly because he found some way I didn't think of to exclude web series. But it's a similar shape.

Now here's the graph you've been waiting for: mean rating over time:

Mean rating over time

It's a sad story of precipitous drops in quality: one between 1959 and 1980, one between 1999 and 2005. By this measure, 2005 was the worst year in television history. If you only looked at mean rating over time, you'd say that there was one golden age of television, from 1955 to 1965, and that the 1980-2000 period was a period of stagnation interrupting an otherwise steady decline.

The graph of median rating over time tells much the same story, so I won't transclude it, but you can follow this link to see it.

But, mean rating isn't the whole story. Let me pull out the only statistics trick I know: look at the standard deviation of the ratings over time.

Standard deviation over time

1959, the year with the highest mean rating, is also a year of extreme homogeneity. Less than one star of difference separates the very good shows from the very bad shows. After 1959, the good shows get better, and the bad shows get worse, relative to the mean. In 1980 the standard deviation was 1.37 stars, and in 2011 it was almost two stars. Remember that ratings are not normally distributed, so two stars is quite a lot. (Even one star, as in 1959, ain't nothing.)

Combine this with the skyrocketing number of shows (which begins in the late 90s and goes into overdrive once we start counting web shows) and you can see how that 2000-2005 decline happened. Over 1300 distinct shows aired in 2005. Of course the mean show is going to be crap! The amazing thing is that things have gotten better since 2005, even as we now make over twice as many shows per year. (And web series! Can't forget those!)

Another factor is that people aren't even bothering to rate the bad shows. Here's the percentage of shows that aired in a given year that don't have IMDB ratings because they haven't gotten enough votes. For 2011, this was a majority of shows!

Unrated show percentage over time

Old shows aren't rated because nobody remembers them. New shows aren't rated because... well, I did a bunch of spot checks, and they fall into three categories. 1) web series, 2) shows that were never aired and maybe never even produced, 3) crap. Only #3 can properly be considered part of "television". The mean rating would certainly be lower if every show had a rating, but I don't know how much lower.

That's where we stand: television is bad, and it's getting worse. That trend may have been reversed recently, or the decline may have been masked by web shows with passionate fans, or things may have gotten so bad that people stopped even bothering to rate the crap. But! Would you exchange the television of today (mean rating: 6.2) for the television of 1973? (mean rating: 7.3). I wouldn't, and I don't think you would either. What's going on?

Well, we don't watch the mean television show. We only watch the good shows. (If you've read this far, I'm gonna go ahead and make that assumption.) And if you look at the good shows, the picture looks very different.

Here's what the shows look like one standard deviation above the mean. This is basically the top 16% of shows:

Best shows over time

At the high end, the decline in quality is reversed in the 80s and early 90s. The gains are undone in the late 90s (2005 is still terrible), but then quality shoots back up. This is very similar to Arbesman's graph of show length over time.

What if you're even more selective? Let's graph the value 1.5 standard deviations above the mean for each year. I don't know what percentile this would correspond to, but it's something like the top 5%. This is the very best stuff you can find on TV in a given year:

Best of the best over time

This graph, I think, is the best answer to "what would a golden age look like"? It would look like the 60s, when there were three channels under tight quality control, and you could turn on the television at any given time and probably find something good. Or it would look like right now, when a huge number of shows are being produced, and it's easy to be a snob and only watch the very best. This is why we don't remember 2005 as being the worst year of TV in the history of the medium, and this is why I'd never trade today's TV for 1973's TV, even though 1973 looks pretty good on that graph.

So, there you have it--another way of looking at the IMDB data. More to come! Next up: a little thing I like to call "Worst Episode Ever".

Tue Feb 21 2012 09:10 Constellation Games Author Commentary #13: "Your Day Job": The lucky chapter thirteen introduces the novel's last major character, Mission Specialist Dr. Tammy Miram. She gets right to work, kicking off a subplot that won't be wrapped up til the very last chapter. Let's look at a bunch of commentary, most of which is about her. But first, Twitter archive from last week! Okay, here we go:

Tammy Miram is the only character in the novel who was given a name with an eye towards its symbolism, i.e. I didn't have a name handy so I thought "what would Charles Dickens do?". "Miram" is the Arabic name of the star η Persei. "Tammy" doesn't mean anything in particular, but it was a popular girls' name in the early 1970s. This is the Social Security Administration technique for character naming, and I strongly recommend it.
Oh, but I just looked up the name Tammy and it means "twin". Can't escape the symbolism!
I'm calling her "Tammy" because I mentioned her name last week in the tribute to Janice Voss, but Ariel calls her Miss Ion Specialist throughout this chapter and you don't find out her real name until next week. In the second draft, Ariel called her "Ion" for the entire book. You didn't find out her real name until chapter 30.
Why did I change it? Mainly because of the continuous shift away from the blog format. It didn't make sense for Ariel to be using a blog pseudonym for Tammy in narration. And it would have been too confusing for Ariel to constantly switch back and forth between "Ion" and "Tammy". Especially because I've already got another character for whom Ariel uses different names in narration vs. blog.
In the third draft, Ariel held on to "Ion" until Part Two. Now he drops the pseudonym almost immediately, when Jenny calls him on it.
In the second draft the weightless Ariel/Tammy conversation was too dialogue-heavy and it was unclear where they were physically in relation to each other. So I introduced Ariel's frantic attempts to use the kicker and rotating in ways he doesn't want to rotate, contrasted with Tammy's mastery of the environment. This lessens the jokey handwaviness of how Ariel's able to negotiate weightlessness at all.
Tammy's being from Akron is a reference to my favorite Steven Colbert joke, a question from his interview with Congresswoman Stephanie "Tubbs" Jones: "Twenty-two astronauts are from Ohio. What is it about your state that makes people want to flee the Earth?" Also a Devo reference.
Tammy is the star of the unfinished bonus story "A Princess Of Mars." Not ruling out finishing that story, but it'd need to be some kind of currently unplanned tie-in. Like if we did a Kickstarter project to produce a hardcover edition to squeeze a few more dollars out of all you fine customers.
The ISS backlog is a bunch of shelved experiments that couldn't be brought up because the shuttle program was cancelled back in the 2000s. They were reassembled in a hurry and most of them don't work anymore.
During the writing of the third draft, I decided to alt-history the International Space Station (in real life a product of the Cold War) into a post-Glavnaya US/Russian Federation joint, the Space Science Station. This stupid idea lasted so little time I don't have any records of it. Beta readers were confused for absolutely no gain except my own personal satisfaction at having made the world more self-consistent.
The human space station in this alt-history is still a US/Russian Federation joint, it's just that a) it's called the International Space Station, and b) that fact has absolutely no effect on the story. Problem solved!
Hey, Brendan, here's your one reference to the Cryptids in this whole novel.
I really should have pushed the Starfarer release date back a few years. 1987 is a little late for a pure text adventure, and Tammy would have been in high school in 1987, so there was plenty of time to push it back.
In my opinion the gravity kicker is the single biggest piece of Creative License in the Constellation universe. Not because it's technically impossible, but because it breaks the worldbuilding. All I can do is point it out and/or hope you don't notice.
In "Vanilla", and in the second draft of CG, the device was a "sonic kicker" and it used reflected sound waves. But in conversation with physicist Nick Murphy I learned that although a sonic kicker is technically possible, you'd need to use sound waves so powerful as to kill on impact. Or something like that. Anyway, I changed the kicker to use gravity waves, but a small handheld gravity wave generator is a very valuable piece of tech, on par with the portable wormholes. And here I had this whole novel where ports were treated as very valuable tech and the ~~sonic~~ grav kicker was treated more like a Hammacher Schlemmer gadget. Instead of rewriting big chunks of the novel to deal with this point, I now invite you to enjoy a heaping spoonful of... Creative License.
Ariel's conversation with Tammy about Cody Wicklund is kind of obscure, the kind of thing you'd expect to pay off later but it doesn't. Exactly the kind of thing I can write commentary about!
See, it's supposed to pay off when I rewrite "Vanilla". Cody Wicklund is that story's POV character, and he's a pretty famous person in the Constellation universe, so I felt like it would be fun to mention him in the novel. This was a good place because it's reasonable that Tammy would know him. But I dunno if I'm actually going to rewrite "Vanilla", so this is Constellation Games's equivalent of that teaser caption at the end of Buckaroo Banzai.
Tammy disliking Cody Wicklund is new. In "Vanilla" he was an unassuming scientist, not someone you'd have a strong enough opinion about to dislike—vanilla, in other words. I decided he'd be more interesting if he were more amoral, the kind of person Ariel might compare to Werner von Braun. Will it pan out? Maybe.

That's all I got. Stay tuned for the huge chapter 14, a chapter full of deepening mysteries and used game trade-ins, the chapter where Ashley finally says, "Ariel was distracted by my beautiful ovipositor."

Image credits: NASA, Mark Phillips, Allen Garvin.

<- Last week | Next week ->

Tue Feb 21 2012 19:33: Remember when this weblog used to be about fun links? I don't either, but I think it was somewhere in there. Well, check this out: last year when I went to PAX my most enjoyable experience was the panel "Videogames Antiques Roadshow." It worked just like you think: people would bring old game stuff up on stage, and distinguished collectors would estimate the value of the old stuff. Here are some pictures from that panel. In fact, you can see me in the second photo, fourth row center.

Kind of got distracted there--the point of this post is not to look at a crowd scene that includes me. I meant to say that they brought the panel back at PAX Prime, and this time there's video. And it's now called "Retrogaming Roadshow", possibly due to trademark issues. In addition to bringing to light cool bits of history like the PCjr edition of M.U.L.E., I love the way these panels illustrate the social construction of value. Highly recommended if you've got an interest in this stuff.

(6) Wed Feb 22 2012 16:39 Worst Episode Ever: Time for some more IMDB fun. Last time I looked at whole years of television. This time, I'll graph the ratings for individual episodes of TV shows. Can we watch shows get better or worse over time?

We sort of can. The problem is that only a true fan bothers to go to IMDB and rate individual episodes of a TV show. So you can't really trust the episode ratings--they're too high. But we can visualize trends in show quality, as percieved by the fans.

For these visualizations you want long-running series with lots of die-hard fans. So let's start with Star Trek:

Star Trek

Star Trek: The Next Generation

Star Trek: Deep Space Nine

Star Trek: Voyager

Enterprise

(Note the very last data point in that one. That's the series finale, which everyone hates.)

There's a lot of scatter, but you can generally see the common Star Trek pattern of the show getting better as the ensemble cast comes together. Except for the original series, which ended with a lousy season. Now let's look at another nerd favorite, "Buffy the Vampire Slayer":

Buffy the Vampire Slayer

Beth requested that one. I've seen exactly one episode of Buffy so I wasn't expecting anything in particular. It looks like a show that's consistently good, but wildly inconsistent within the bounds of "consistently good". It doesn't really get better over time. Maybe the Voyager and DS9 graphs look the same to someone who's not a Trek fan.

But compare "Mystery Science Theater 3000", which gets drastically better over time. When I was younger I would have disputed this finding, but now I basically agree with this graph:

Mystery Science Theater 3000

I did a lot more graphs, but I'll just show two more. Here's the graph for "The Simpsons", a very long-running show with a very fickle fan base (see title of this post):

The Simpsons

Wow! I love this graph! I don't know enough about the history of the show to name the historical trends, but I'm pretty sure a Simpsons fan will be see a big part of their life history reflected in this graph.

I wanted to see if this sort of coherent shape was just an artifact of the fact that "The Simpsons" has been on the air for over 20 years, so I graphed another long-running show notorious for huge variation in quality, "Saturday Night Live":

Saturday Night Live

You can definitely see where things went wrong, but even within a season there's huge variation in quality. The Simpsons is created by the same people every week, where SNL has two wild cards every week: its guest host and musical guest. And since it's sketch-based, three good or three awful minutes can make or break the entire episode.

Next up, the third and possibly final part of this analysis, in which I'll pit fans of a show against the general public.

PS: For the record, according to IMDB data, the actual worst episode ever of "The Simpsons" was #9.11, "All Singing, All Dancing".

Update: People in comments had questions I can't answer because I only know how to do very basic statistics, but they also had questions about how many people rated the episodes, which I can answer. This table shows how many people have rated each series as a whole, as well as the median and mean numbers of ratings for every episode that has any ratings. I also included how many people rated the first episode, how many rated an episode in the middle, and how many rated the last/most recent episode.

Series Series ratings Show ratings (median) (mean) (std) First show Middle Most recent

"Buffy the Vampire Slayer" (1997) 34564 498 553.41 224.88 862 511 1091

"Enterprise" (2001) 8843 140 189.27 242.28 2397 130 152

"Mystery Science Theater 3000" (1988) 6650 57 65.54 47.41 21 78 131

"Saturday Night Live" (1975) 10151 15 19.86 15.65 112 11 60

"Star Trek" (1966) 12695 419 480.95 222.83 668 389 1923

"Star Trek: Deep Space Nine" (1993) 9779 172 188.32 107.37 1501 151 290

"Star Trek: The Next Generation" (1987) 16974 329 375.62 354.49 2189 318 4580

"Star Trek: Voyager" (1995) 11245 153 169.08 110.96 1558 177 348

"The Simpsons" (1989) 15578 319 355.07 173.09 2214 309 96

Series	Series ratings	Show ratings (median)	(mean)	(std)	First show	Middle	Most recent
"Buffy the Vampire Slayer" (1997)	34564	498	553.41	224.88	862	511	1091
"Enterprise" (2001)	8843	140	189.27	242.28	2397	130	152
"Mystery Science Theater 3000" (1988)	6650	57	65.54	47.41	21	78	131
"Saturday Night Live" (1975)	10151	15	19.86	15.65	112	11	60
"Star Trek" (1966)	12695	419	480.95	222.83	668	389	1923
"Star Trek: Deep Space Nine" (1993)	9779	172	188.32	107.37	1501	151	290
"Star Trek: The Next Generation" (1987)	16974	329	375.62	354.49	2189	318	4580
"Star Trek: Voyager" (1995)	11245	153	169.08	110.96	1558	177	348
"The Simpsons" (1989)	15578	319	355.07	173.09	2214	309	96

So SNL actually has very few ratings per episode, while The Simpsons is on par with ST:TNG. It's common for the first episode and the finale to have many more ratings than others. And here's a graph of the number of people who have rated "The Simpsons" over time:

Simpsons ratings over time

Fri Feb 24 2012 11:17 Beautiful Soup 4 Beta 8: I didn't even mention beta 7 on NYCB because it was oriented towards getting rid of test failures. Test failures that had a lot to do with what versions of what parsers were installed, but nothing to do with whether or not Beautiful Soup itself was broken.

Beta 8 adds very basic namespace awareness. By "basic" I mean:

Handle documents that include namespaced tags and attributes without crashing or mangling the document on output.
If the parser provides namespace information for a tag or attribute, store it for the user's reference instead of discarding it.

That's it. No one responded to my request for namespace-related feature requests, so I'm doing the bare minimum.

(2) Mon Feb 27 2012 12:59 Incorrectly Regarded As Good: In this third and final part of my IMDB data adventure, I want to switch from graphs to tables, and shed light on the eternal struggle between fans and non-fans. If fans are the ones who care enough to rate individual episodes, non-fans are the ones more likely to rate the show as a whole. I looked at every show that has at least 100 ratings, plus at least 100 rated episodes. I divided the show rating by the mean episode rating to get a "fan appreciation quotient". (I used mean because the show rating itself is a mean, calculated by IMDB.)

Shows with high FA quotients are more beloved by fans than by the general IMDB-using public:

FA quotient Show Show rating Mean episode rating

1.63 "Entertainment Tonight" (1981) 3.7 6.0

1.34 "Melrose Place" (1992) 5.7 7.6

1.28 "Dynasty" (1981) 5.9 7.6

1.28 "The Rosie O'Donnell Show" (1996) 3.6 4.6

1.26 "Mighty Morphin' Power Rangers" (1993) 6.0 7.5

1.24 "Full House" (1987) 6.0 7.4

1.20 "Ghost Whisperer" (2005) 6.4 7.7

1.20 "Fear Factor" (2001) 4.9 5.9

1.16 "Dharma & Greg" (1997) 6.7 7.7

FA quotient	Show	Show rating	Mean episode rating
1.63	"Entertainment Tonight" (1981)	3.7	6.0
1.34	"Melrose Place" (1992)	5.7	7.6
1.28	"Dynasty" (1981)	5.9	7.6
1.28	"The Rosie O'Donnell Show" (1996)	3.6	4.6
1.26	"Mighty Morphin' Power Rangers" (1993)	6.0	7.5
1.24	"Full House" (1987)	6.0	7.4
1.20	"Ghost Whisperer" (2005)	6.4	7.7
1.20	"Fear Factor" (2001)	4.9	5.9
1.16	"Dharma & Greg" (1997)	6.7	7.7

Note that since this is a quotient, it has nothing to do with the magnitude of the ratings. "The Rosie O'Donnell Show" got terrible ratings even from the people I'm assuming are fans; it's just that the show as a whole did even worse.

OK, smarty pants, what about a low FA quotient? How can a show appeal more to the mainstream than to its own fans? Well, I think a low FA quotient means that a show seems better in retrospect than it actually was. Or, more positively, it means that a show was more than the sum of its parts. Either way, here are the shows with the lowest FA quotients:

FA quotient Show Show rating Mean episode rating

0.78 "Bonanza" (1959) 7.3 5.7

0.78 "NYPD Blue" (1993) 7.7 6.0

0.77 "In Living Color" (1990) 7.9 6.1

0.75 "Teenage Mutant Ninja Turtles" (1987/I) 8.1 6.0

0.73 "Gunsmoke" (1955) 8.0 5.8

0.71 "What's My Line?" (1950) 8.9 6.3

0.71 "Saturday Night Live" (1975) 8.1 5.7

0.68 "House of Payne" (2006) 2.5 1.7

0.62 "Ellen: The Ellen DeGeneres Show" (2003) 7.3 4.6

0.60 "MADtv" (1995) 6.7 4.0

FA quotient	Show	Show rating	Mean episode rating
0.78	"Bonanza" (1959)	7.3	5.7
0.78	"NYPD Blue" (1993)	7.7	6.0
0.77	"In Living Color" (1990)	7.9	6.1
0.75	"Teenage Mutant Ninja Turtles" (1987/I)	8.1	6.0
0.73	"Gunsmoke" (1955)	8.0	5.8
0.71	"What's My Line?" (1950)	8.9	6.3
0.71	"Saturday Night Live" (1975)	8.1	5.7
0.68	"House of Payne" (2006)	2.5	1.7
0.62	"Ellen: The Ellen DeGeneres Show" (2003)	7.3	4.6
0.60	"MADtv" (1995)	6.7	4.0

Look how much sketch comedy there is on that list! I think I'm on to something. Two of my favorite shows, ST:TNG and MST3K, also have low FA quotients of 0.83 and 0.84 respectively.

And right in the middle we have the shows that are exactly as good (or bad) as you remember them:

FA quotient Show Show rating Mean episode rating

1.00 "Becker" (1998) 7.6 7.6

1.00 "Cold Case" (2003) 7.5 7.5

1.00 "Dancing with the Stars" (2005/I) 4.8 4.8

1.00 "Hercules: The Legendary Journeys" (1995) 6.6 6.6

1.00 "MacGyver" (1985) 7.8 7.8

1.00 "Mission: Impossible" (1966) 8.1 8.1

1.00 "Project Runway" (2004) 6.6 6.6

1.00 "Rawhide" (1959) 8.2 8.2

1.00 "The Practice" (1997) 7.7 7.7

FA quotient	Show	Show rating	Mean episode rating
1.00	"Becker" (1998)	7.6	7.6
1.00	"Cold Case" (2003)	7.5	7.5
1.00	"Dancing with the Stars" (2005/I)	4.8	4.8
1.00	"Hercules: The Legendary Journeys" (1995)	6.6	6.6
1.00	"MacGyver" (1985)	7.8	7.8
1.00	"Mission: Impossible" (1966)	8.1	8.1
1.00	"Project Runway" (2004)	6.6	6.6
1.00	"Rawhide" (1959)	8.2	8.2
1.00	"The Practice" (1997)	7.7	7.7

Haters

Similar to the struggle between fans and non-fans is that between fans and antifans, a.k.a. haters. Fans of a show will give it a very high rating, and haters will give it a very low rating. We can detect this by looking for shows whose ratings have high standard deviations. IMDB doesn't make the standard deviation available directly, but it does provide a ten-character ASCII string that represents the distribution of ratings.

Star Trek: The Next Generation has been rated 16,974 times. Its rating distribution string looks like this: "0000000124". The "4" means that the number of ten-out-of-ten votes is somewhere between 40% (6,790) and 49% (8,316) of those 16,974 votes. The "2" means that between 20% and 29% of the votes are nine-out-of-ten, the "1" means that between 10% and 19% of the ratings are eight-out-of-ten. The zeroes mean that the other star ratings account for between 1% and 9% of ratings each. You can see the conversation about TNG is very heavily dominated by the fans.

I reconstructed the original rating distribution very roughly by treating the character "0" as five percent of the total votes, "1" as fifteen percent, and so on, up to "9" meaning 95 percent of the votes. How rough is the reconstruction? Well, for TNG, the reconstructed distribution has 20,363 data points, where the actual distribution (whatever it is) only has 16,974.

When I take the standard deviation of the reconstructed distribution for ST:TNG, I get 2.74 stars. This particular number is not trustworthy because of the assumptions made in reconstructing the distribution. But by making the same assumptions for every show, we can see which shows are the most divisive. Here are the shows with the largest standard deviations, among all shows with more than 1000 ratings:

Standard deviation Show Rating Votes Distribution
3.85 "Laguna Beach: The Real Orange County" (2004) 3.7 2170 3000000003

3.76 "Barney & Friends" (1992) 3.7 1255 4000000002

3.76 "Jon & Kate Plus 8" (2007) 5.4 2716 2000000004

3.76 "The Hills" (2006) 3.3 5828 4000000002

3.75 "Shake It Up!" (2010) 4.8 1013 2000000003

3.75 "Paranormal State" (2007) 4.5 1438 3000000002

3.75 "Flavor of Love" (2006) 4.5 1254 2000000003

3.75 "The Simple Life" (2003) 3.4 2956 3000000002

3.75 "The Jerry Springer Show" (1991) 3.9 1631 3000000002

3.75 "Jersey Shore" (2009) 4.5 3130 3000000002

3.75 "Hannah Montana" (2006) 3.9 1927 3000000002

3.75 "Big Brother" (2000/III) 4.0 1621 3000000002

Standard deviation	Show	Rating	Votes	Distribution
3.85	"Laguna Beach: The Real Orange County" (2004)	3.7	2170	3000000003
3.76	"Barney & Friends" (1992)	3.7	1255	4000000002
3.76	"Jon & Kate Plus 8" (2007)	5.4	2716	2000000004
3.76	"The Hills" (2006)	3.3	5828	4000000002
3.75	"Shake It Up!" (2010)	4.8	1013	2000000003
3.75	"Paranormal State" (2007)	4.5	1438	3000000002
3.75	"Flavor of Love" (2006)	4.5	1254	2000000003
3.75	"The Simple Life" (2003)	3.4	2956	3000000002
3.75	"The Jerry Springer Show" (1991)	3.9	1631	3000000002
3.75	"Jersey Shore" (2009)	4.5	3130	3000000002
3.75	"Hannah Montana" (2006)	3.9	1927	3000000002
3.75	"Big Brother" (2000/III)	4.0	1621	3000000002

That list has a bottom, but it's not interesting--it's the shows about whose quality there is general consensus. All right, here it is:

Standard deviation Show Rating Votes Distribution

2.38 "Mork & Mindy" (1978) 7.0 1746 0000012211

2.38 "Around the World in 80 Days" (1989/I) 6.9 1446 0000012211

2.38 "Amazing Stories" (1985) 7.3 1467 0000012211

2.38 "V" (1984) 7.2 2557 0000012211

2.38 "Crusade" (1999) 7.0 1133 0000012211

2.34 "Impact" (2008) 5.6 1633 0000111000

2.31 "Nuremberg" (2000) 7.2 2754 0000012311

2.22 "Moby Dick" (1998) 6.5 1967 0000112100

2.15 "Golden Years" (1991) 5.0 1459 0001211000

2.12 "Covert One: The Hades Factor" (2006) 5.7 1011 0000122000

2.12 "The Andromeda Strain" (2008) 6.1 5858 0000122100

Standard deviation	Show	Rating	Votes	Distribution
2.38	"Mork & Mindy" (1978)	7.0	1746	0000012211
2.38	"Around the World in 80 Days" (1989/I)	6.9	1446	0000012211
2.38	"Amazing Stories" (1985)	7.3	1467	0000012211
2.38	"V" (1984)	7.2	2557	0000012211
2.38	"Crusade" (1999)	7.0	1133	0000012211
2.34	"Impact" (2008)	5.6	1633	0000111000
2.31	"Nuremberg" (2000)	7.2	2754	0000012311
2.22	"Moby Dick" (1998)	6.5	1967	0000112100
2.15	"Golden Years" (1991)	5.0	1459	0001211000
2.12	"Covert One: The Hades Factor" (2006)	5.7	1011	0000122000
2.12	"The Andromeda Strain" (2008)	6.1	5858	0000122100

I experimented with a different mapping of the distribution, e.g. saying that "0" meant 2 percent of the votes, "1" meant ten percent, "2" meant 20 percent, and so on. This made the standard deviations into smaller numbers, but it didn't change the ordering of shows very much.

Variability

We can also measure how much a show varies in quality by taking the standard deviation of the ratings given to its episodes. For this I looked at shows which had at least ten episodes that had been rated at least ten times. Here are the results—the "Variability" is the standard deviation of the episode ratings, in IMDB stars.

Variability Show Show rating
3.32 "The Tonight Show Starring Johnny Carson" (1962) 8.3

2.74 "The Late Late Show with Craig Ferguson" (2005) 8.6

2.62 "Jimmy Kimmel Live!" (2003) 6.4

2.60 "Beauty and the Geek" (2005) 5.9

2.37 "Late Night with Conan O'Brien" (1993) 8.5

2.23 "Late Show with David Letterman" (1993) 6.9

2.04 "Silk Stalkings" (1991) 6.1

1.89 "The Tonight Show with Jay Leno" (1992) 5.3

1.87 "Superboy" (1988) 6.3

1.70 "Duck Dodgers" (2003) 8.2

1.68 "The Virginian" (1962) 7.7

1.68 "Ellen: The Ellen DeGeneres Show" (2003) 7.3

Variability	Show	Show rating
3.32	"The Tonight Show Starring Johnny Carson" (1962)	8.3
2.74	"The Late Late Show with Craig Ferguson" (2005)	8.6
2.62	"Jimmy Kimmel Live!" (2003)	6.4
2.60	"Beauty and the Geek" (2005)	5.9
2.37	"Late Night with Conan O'Brien" (1993)	8.5
2.23	"Late Show with David Letterman" (1993)	6.9
2.04	"Silk Stalkings" (1991)	6.1
1.89	"The Tonight Show with Jay Leno" (1992)	5.3
1.87	"Superboy" (1988)	6.3
1.70	"Duck Dodgers" (2003)	8.2
1.68	"The Virginian" (1962)	7.7
1.68	"Ellen: The Ellen DeGeneres Show" (2003)	7.3

There's a lot of late-night talk here. If I loosened the restriction on number of ratings per episode, I also got a lot of soap operas (most of whose episodes have no ratings at all).

And here's the bottom of that list: the most consistently good (or, in theory, bad) shows on TV:

Variability Show Show rating

0.20 "Day Break" (2006) 8.3

0.20 "Lucky Louie" (2006) 8.1

0.20 "Boardwalk Empire" (2010) 8.9

0.20 "Hung" (2009) 7.5

0.19 "Outsourced" (2010) 7.7

0.19 "The Ben Stiller Show" (1992) 7.3

0.18 "Happy Endings" (2011) 8.1

0.18 "Lewis" (2007) 7.9

0.08 "Planet Earth" (2006) 9.7

Variability	Show	Show rating
0.20	"Day Break" (2006)	8.3
0.20	"Lucky Louie" (2006)	8.1
0.20	"Boardwalk Empire" (2010)	8.9
0.20	"Hung" (2009)	7.5
0.19	"Outsourced" (2010)	7.7
0.19	"The Ben Stiller Show" (1992)	7.3
0.18	"Happy Endings" (2011)	8.1
0.18	"Lewis" (2007)	7.9
0.08	"Planet Earth" (2006)	9.7

I looked into the variability of the ratings distribution for individual episodes, hoping to find the most/least controversial TV episodes ever aired, but most of what I found looked like ratings juking. For instance, "Friday Night Lights" and "The Shield" show a hater/fan dynamic on the episode level: some people rating every individual episode very low and others rating every episode very high.

I think that's enough for now, but I'll come back to the data as I have more ideas, and maybe I'll even learn more than basic statistics for you.

Mon Feb 27 2012 22:26: Last year I learned about the LEGO model of the International Space Station. Today I learned that sometime last year Satoshi Furukawa assembled the LEGO ISS on board the real ISS. In a glovebox, so the pieces wouldn't fly away. There are educational videos.

Tue Feb 28 2012 10:04 Constellation Games Author Commentary #14: "The Wave Function Of The Universe": Damn, the time is flying. Part One ends in three weeks. And today there's a lot of non-commentary stuff I want to talk about, so the commentary itself will be pretty light.

First, I want to tell you that Jeremy Penner implemented Chapter 5's Gatekeeper in HTML5 for the 2012-in-One Glorious Developers Konference Kollection. You can play it online. I wouldn't classify Gatekeeper as fan art, though Jeremy is a fan, because he did it for me as a Kickstarter reward. But either way, it's pretty great!

Second, I want to talk about the process of designing the cover art. You don't have to read the book to "get" the cover—that wouldn't exactly help sales—but the design details are a product of in-world thinking. And at this point you've seen enough of the universe that I can go through that thinking without big spoilers.

The cover is by Chris Sobolowski, who wants me to mention his email address and let y'all know that he's available for graphic design work. So if your contract with Jenny Gallegos fell through due to her being a fictional character, contact Chris, who's a real person.

The process went like this: first, Kate and I laid out a huge number of cover ideas (some of which I've mentioned in earlier commentaries), and decided we wanted a cover themed around the ET hardware. At this point Kate got Chris involved, and Chris came up with a couple sketches that made the book look like a handheld computer. Here's one of them, next to the cover we ended up using:

I've spent months looking at the finished cover instead of this first draft, and what strikes me now is how similar they are. But what struck me at the time was that the computer looks like a piece of military hardware. It's dark and brooding, like one of Batman's gadgets. I wanted something flashy and colorful, like one of Batman's gadgets. Or like the Hitchhiker's Guide, to not use the same analogy twice in a row.

But I'm not the artist, and I'm also not a writer who thinks he can do the artist's job. So instead of demanding specific changes I wrote two different in-world histories for this handheld computer, and presented them to Chris.

In one story, the computer was a product of the Dhihe Coastal Coalition, the Farang civilization that produced the Brain Embryo. This explained the military appearance, and it had certain implications for changes he should make to the design. (E.g. making the buttons much smaller).

In the other story, the one we went with, the computer is an Ip Shkoy ripoff of a Dhihe design, produced by Perea, the conglomerate that also put out the game reviewed in this chapter, A Tower of Sand. (The glyphs on the final cover's buttons say "pe" "re" "a".) This has its own implications: the colors are now so bright as to verge on the garish, making the computer look more like a consumer product and making the book look more like a comedy and less like a technothriller.

In this story, the only remaining Farang detail is the Brain Embryo-esque mother-of-pearl finish. Stylistically it's reminiscent of the wood grain on an Atari 2600, but it tells a different story. When you were a kid, electricity was an advanced technology. Then all these space aliens showed up handing out blueprints for handheld computers. You want something that looks as different as possible from the wooden toys you had when you were young.

The cocktail cabinet-like second set of controls at the top comes from this bit I wrote about the computer's social context:

Why would the notoriously social Ip Shkoy build a single-user game system? It probably has something to do with sex. Imagine this portable computer as a product for the swinging bachelor, full of "sophisticated" adult games to break the ice, contact management applications to replace one's little black book, and a vibrator peripheral for when the night's inevitable failure leaves you alone in your crappy apartment.
This device would need to have some two-person controls, so that you can play those icebreaker games with your would-be conquest, but the overall feel would be that this is my computer, but I might let you use it.

Chris took the Ip Shkoy story and produced something that's very close to the final cover. Here's another side-by-side comparison:

After that, there was a lot of back and forth on trivial details like how much and what kinds of wear should be visible on the computer. Around this time Adam was designing the Pey Shkoy language for Tetsuo's Twitter feed, so I asked him to also design a script for use on the cover. This is also the point where Kate got the idea for a "Berlitz Traveler's Lexicon," which became "Pey Shkoy Benefits Humans."

I haven't mentioned the back cover, but at this point I think I've reached or exceeded the limit on how long this discussion can be without getting dull, so let's move on to chapter 14 commentary. But not before linking to the archive of last week's Twitter fun.

This chapter sizzles with tastefully-written microgravity sex and Ariel's transparent attempts to hide what's going on between him and Dr. Tammy Miram. There's a basic pacing rule about putting a quiet denouement after a big roller-coaster action scene. As we go through the book, you may notice that I follow the opposite rule for the emotional arc. Basically, any time Ariel gets laid, I'm about to ruin his life.
My cynicism aside, I do think the microgravity scene is pretty sweet.
The second draft didn't have the scene in Tetsuo and Ashley's apartment. Instead it had an incredibly boring scene with Ariel/Tetsuo/Ashley at the top of the cma forest watching the "sunset", a scene so boring that Ariel himself fell asleep over the course of the scene. So I punched it up with the Gift of the Magi-esque farce about the English lessons.
It didn't happen in the apartment, but the second draft did have Tetsuo commenting on the smell of Ariel's pheromones. Every time I came back to that exchange, I rewrote it to make it more obvious what was going on. I regret that now; I should have kept it subtle.
Sometimes an author is presented with a golden opportunity to disguise a horrible in-joke as a real-sounding line of dialogue. For instance, there's a scene in Neal Stephenson's The Confusion that seems to have been set up just so one character can legitimately say "I didn't expect the Spanish Inquisition!" I want to make it clear that I did not name a character Ashley just so I could have Ariel say "Hey Ash, whatcha playin'?" But once I thought of the joke, I was not strong enough to resist it.
Tetsuo's reaction to the hypothetical right-wing militia may be my favorite Tetsuo bit in the book. I'll probably have a different favorite Tetsuo bit in every subsequent chapter, though. I mean, the cake bit is pretty great too. Yeah, I'm gonna stop this before it becomes Tetsuo Reminiscence Corner.
Af be Hui is a severely underused character, because she's been dead for seventeen million years. But if I write a sequel to Constellation Games, I won't let that stop me.

OK, that's plenty for this week. Next week: IT BEGINS. Oh, and Curic says, "Silence, puny human!"

Image credits: Jeremy Penner, Chris Sobolowski, NASA

<- Last week | Next week ->