(3) Thu Feb 02 2012 11:59 easy_install beautifulsoup4:
This is an HTMLized version of an email I sent to the Beautiful Soup discussion group, about the impending release of Beautiful Soup 4.
Introduction
When Beautiful Soup was first released in 2004, the state of HTML
parsing in Python was appalling. Over the past eight years, things
have improved so dramatically that Beautiful Soup's HTML parser is no
longer a competitive advantage. I don't want to duplicate other
peoples', work, so I'm getting Beautiful Soup out of the parser
businesss. Beautiful Soup's job is now to provide a Pythonic
screen-scraping API on top of a data structure created by a
third-party parser.
This will be Beautiful Soup 4, and I've been planning it for
years. With help from Thomas Kluyver and Ezio Melotti, I've now met
the three main goals of Beautiful Soup 4:
- Make a single codebase that works under Python 2 and Python 3.
- Stop using SGMLParser (removed in Python 3) and make it possible to
swap out one parser for another.
- Support two major Python parsers (lxml and html5lib) as well as
Python's (not currently very good) batteries-included parser,
html.parser.
The first version of BS4 is almost ready for release, and I'd like you
to test it out, if you haven't already. I still to fix some things, in
particular some performance problems. But, note that even with the
performance problems, BS4 is faster than BS3 across the board.
On Python 2 or Python 3 you can install the BS4 beta with this command:
easy_install beautifulsoup4
You can also get the source tarball.
The documentation has been completely rewritten. You may find the section on porting BS3 code to BS4 especially
interesting.
There are three major things I'd like your feedback on before
completing the release.
Hall of Fame
The BS3 documentation lists open-source projects that use Beautiful
Soup. I stopped maintaining this list many years ago because there are
hundreds of these projects, and since most of them are
screen-scrapers, they're pretty ephemeral.
I'd like to bring this feature back as a "hall of fame", featuring
applications of Beautiful Soup that grab a reader's attention. People
who used Beautiful Soup in a high-profile way or to tackle a big
issue. Projects that are interesting to hear about even if the
software doesn't work anymore, or uses an old version of Beautiful
Soup, or if Beautiful Soup was used internally and the public only saw
the results.
My bias is towards projects having to do with space, science,
journalism, politics and social justice. Here are some examples so you
know the kind of thing I'm thinking of:
- "Movable Type", a work of digital art on display in the lobby of the
New York Times building, uses Beautiful Soup to scrape New York Times
feeds.
- Alexander Harrowell uses Beautiful Soup to track the business
activities of an arms merchant.
- The Lawrence Journal-World used Beautiful Soup in 2006 and 2010 to
gather election results.
- The NOAA's Forecast Applications Branch uses Beautiful Soup in
TopoGrabber, a script for downloading "high resolution USGS datasets."
If you did anything of this sort, or know of someone who did, I'd
like to hear about it.
Do you prefer lxml or html5lib?
Right now, the parser ranking goes lxml, html5lib, html.parser. I like
lxml because it's incredibly fast and it can parse anything. But I'd
like to see what you think of the trees it generates. Would html5lib,
with its web-browser-like heuristics, be a better default?
substitute_html_entities
BS3 had a number of overlapping and inconsistent ways of turning
HTML/XML entities into Unicode characters, and possibly turning
Microsoft smart quotes into HTML entities at the same time. In BS4,
all this stuff is gone. HTML and XML entities are *always* converted
into Unicode characters.
This is great but there's one problem: output. If you want to turn
those Unicode characters back into entities when outputting as a
string, you need to call soup.encode(substitute_html_entities=True)
,
which is a little clunky. I'm thinking of adding an
output_html_entities
attribute that you can set on a soup or tag to
control whether this substitution happens. Do you like this idea?
I think I also need to ensure that characters like "&" and "always converted to XML entities on output, even though this will hurt performance a bit.
Conclusion
What you install with easy_install beautifulsoup4
is a beta
release. If I hear of a problem soon, there's still time to fix it,
even if it means a major change to the API. So please try it out and
give me feedback.
Mon Feb 06 2012 17:55:
Earlier I ran some speed/accuracy tests of Beautiful Soup driven by various parsers. Python's built-in HTMLParser scored very poorly, parsing only 52% (Python 2.7.1) or 57% (3.2.2) of my test pages without raising an exception. Well, Ezio Melotti, the maintainer of HTMLParser, has been working for a while on improving HTMLParser's handling of bad HTML. Most of this code is in Python 3.2.2, so I should have been getting the benefit, but it wasn't working for me because of a semi-related bug in HTMLParser, which is fixed in the as-yet-unreleased 3.2.3.
After talking with Ezio today, I was able to monkeypatch BS4 to avoid the bug in 3.2.2. This means on Python 3, BS4 with no external parser installed will give reliability comparable to BS4+lxml (98% versus 99%). It's still about 50% slower, though, parsing about 1300 kb of HTML per second, versus 2100 kb/second for BS4+lxml.
(4) Tue Feb 07 2012 08:48 Constellation Games Author Commentary #11: "Launch Title":
Love those title puns! This blockbuster episode sends Ariel TO THE
MOON and introduces two major new characters, Tetsuo Milk and Ashley
Somn. Also a minor but important character: Linda Blum, Ariel's mom.
Here's last week's Twitter archive, which ran two weeks ago due to my own errors. Twitter service has now resumed, but because this plot arc is so compressed (the rest of Part One crams two weeks of frantic activity into five weeks of real time), most of it is going to come out on Tuesdays and Wednesdays. Don't be afraid, I'll be here the whole time with long-winded commentary:
It's such a relief to be able to talk about Tetsuo! So much
happens in this chapter, he doesn't get a lot of time with Ariel, but
that changes starting next week. Tetsuo is great, I love him a lot, but... he's a scene stealer. Anything I wrote, he would grab and run off in some weird
direction. When the Aliens were choosing human names, Tetsuo is the guy who picked a name because it means "iron man."
Tetsuo reminds me of Londo Mollari from Babylon 5, in that he
starts out a comic relief character (insofar as a comedy can have
designated comic relief) and over time reveals more serious facets of
his personality. But unlike Londo, Tetsuo never stops saying goofy
shit. Tetsuo is the infrafictional author of the subscriber bonus tome "Pey Shkoy Benefits
Humans", which is set about six months after the end of the novel, and
he's still at it.
- Ashley Somn is not a scene-stealer, so her husband kind of
overshadows her for most of the book. But she's an awesome character
on a slow burn. She's got a high-drama character arc that revs up in
the last third, and which I fill in with the bonus story "The
Time Somn Died." (Title is not a spoiler.)
I keep forgetting that Tetsuo is orange and Ashley is green; I
always imagine it the other way around. They're bright neon colors
with darker spots, like tropical frogs. Why? Are Aliens poisonous? I
dunno. Lick one and see!
- The short scene with BEA Agent Krakowski in back of the strip club is the
very last thing I wrote for Constellation Games. Its main
purpose is to dramatize the sub rosa assignment Krakowski gives Ariel,
an assignment which becomes very important in Part Two. But I also
threw in the talking rat, to introduce you to something else that's
very important in Part Two: the idea that Ariel might not be the most
reliable narrator.
I don't know why Krakowski and Fowler were at a strip club in the
middle of the day, but I'm sure it was work-related.
- The part of this chapter that's an excerpt from Ariel's Twitter
feed will not be shown on his
actual Twitter feed, because that would be annoying. But that
section was the main inspiration for the in-character feeds in the
first place.
- Original title for this chapter: "MAN WALKS ON FUCKING MOON."
- One of the big problems with the second draft was that for most of
it, the tone was emotionally distant. It took me a while to
understand the characters. The blog format didn't help, and
I moved further away from it in every draft. Ariel is a guy who shows
vulnerability in huge dramatic bursts and won't open up
otherwise. Etc. etc. Anyway, I worked on this a lot in the third draft, and one of the big changes is that at Andrew Willett's suggestion I modeled Ariel's lunar excursion on this
Narbonic strip.
Ariel's reaction to being launched into space is taken directly
from what I imagine would happen to me. Also possibly taken from my
reaction as a kid to the Disneyland exhibit "Mission to Mars", which
had a similar setup where you saw the ground drop out from underneath
you while feeling absolutely no acceleration.
The sculptures of the figures from the Pioneer plaque (not
"Carl Sagan's gold record", as Ariel mistakenly believes—that's the Voyager record) are another
moment of not-quite-understanding taken from "Vanilla". Ariel's
initial description of the docking bay is the opening parenthesis of a
pretty huge piece of bookending, so watch for that.
One of the imaginary book covers I had in my head while writing was
a design based on the Pioneer plaque, except with (clothed) Ariel and
Jenny.
- Ariel mentions some Eritrean refugees living on Ring City, but
they're only the biggest and most famous group of refugees—the
ones Ariel knows about. Human Ring is also home to smaller groups of
refugees from around the world, and to miscellaneous individuals
living under the radar. The actual population of Human Ring at this
point is closer to 700 than to 500.
The refugees come up a few times later on, but they don't play a
big part in the novel because I don't currently feel I've got the
literary chops to tell their stories. But I knew that not mentioning
refugees would be unrealistic. It would imply that humanity's
governments were able to coordinate to completely lock down the
planet, or that the Constellation was sending away asylum seekers. What we have is a compromise, not one I'm happy with, but I think the best I could do.
I know this is already super long, but I want to introduce a new
segment here on CG Author Commentary, a little recurring bit I like to
call Creative License. Sure, I write silly stories about space
aliens visiting Earth seemingly in violation of the Fermi Paradox, but
that doesn't mean I can just make stuff up. At the same time, I
want Constellation technology to appear very close to the
"indistinguishable from magic" line. Creative License explores that
tension by pointing out things that probably can't exist in real life,
and the made-up reasons I use to justify their existence in the story.
First we have the shuttles the Constellation uses for short hops to
Earth and Luna. I have no idea how they go as fast as they do, but I
have a vague idea how they achieve a smooth ride: they use ports to
maintain an acceleration differential between the inside of the shuttle and
the outside, so the inside accelerates at a gravity-like rate
while the outside powers up towards some horrendous speed. Ports are very important bits of Constellation tech and
need their own segment on Creative License. I'll probably do them
next week, after we see one in action.
But this week we also have the Constellation spacesuits. Inflatable spacesuits are nothing new, but Ariel's suit folds up when not inflated and doesn't seem to have any space for hard parts like air tanks, a fluid recycler, the comm system he plugs his phone into, or a way of dissipating heat. Creative License Solution: as you'll see later, the Constellation does pretty amazing things with origami. I imagine all that fine machinery is packed flat and inflates to the correct shape with the rest of the suit.
What a huge commentary, and this plot arc's just getting started. Be sure to tune in next week, when Ariel will say, "I do not use sex to maintain social cohesion."
Image credits: Andy Bernay, Joe Mabel, Linda Salzman Sagan, Harold W. McCauley.
<- Last week | Next week ->
Wed Feb 08 2012 11:02 Beautiful Soup 4 Beta 4:
Beautiful Soup 4 beta 4 is out! You can install it with easy_install beautifulsoup4
or pip install beautifulsoup4
. You can also download the tarball
or check out the Bazaar repository.
Big changes:
- If you're using Python 3.2, the built-in
html.parser
is now reliable enough to use on its own. You don't need to install lxml or html5lib just to parse bad HTML (but lxml is still a lot faster). The forthcoming Python 2.7.3 should also work this way.
This is of course a feature of Python, but due to a pretty bad bug in html.parser
, I wasn't taking advantage of it. I worked with Ezio Melotti to monkeypatch that bug from within BS, and now we're back in the very good situation of not needing any external dependencies.
new_tag()
will follow the rules of whatever tree builder was used to create the original soup. For example, a new <p> tag will look like "<p />" if you're dealing with XML, but it'll look like "<p></p>" if you're dealing with HTML.
- There's now a
new_string()
method to go along with new_tag()
.
- There are two new methods for manipulating the tree:
PageElement.insert_before()
and PageElement.insert_after()
.
- I replaced the
substitute_html_entities
argument with the more general formatter
argument. You can do all sorts of crazy stuff with this.
- The default formatter converts bare ampersands and angle brackets to XML entities, but doesn't touch HTML entities. I think it's kind of America-centric to convert characters like é to é by default, but I might make the default a "punctuation" formatter that converts things like curly quotes to HTML entities.
Thu Feb 09 2012 17:11 Beautiful Soup 4 Beta 5:
Just going to link to my description message this time. Today I focused on clearing out the bug backlog. It's mostly minor stuff, but I'd like opinions on one change, relating to how a tag is treated if it has multiple CSS classes.
Mon Feb 13 2012 09:13 nanDECK:
I have a little side project creating a print-and-play board game. The game has a lot of cards, but I don't need to design each card individually--I can generate them programatically. Or I could, if I were capable of writing the program.
First I tried ReportLab, the Python library for making PDFs. I'd used it for the sadly-now-defunct Pocket Wisherman, and I thought it would be perfect for putting lots of little squares on a piece of paper.
Not so fast! The Pocket Wisherman puts lots of squares on a piece of paper, but in that program text flows from one square to another. That can't happen on a playing card. The closest I could come with ReportLab was a table, and since I couldn't add spacing between the table cells the way you can in... HTML...
It was easy to get something in HTML that looked right on screen (these cards are pretty simple), but not so easy to get them to look good when printed. So I went back to searching for tools optimized for card design. I delved deep, past many people talking about the best way to manufacture cards for print-and-play-games, and then I found nanDECK by Andrea Nini.
I'm gonna complain a lot about nanDECK so I want to make it really clear that nanDECK solved my problem. In about an hour I went from having two failed Python scripts and no cards, to having cards as nice as my design skils could make them. If I got some design help from someone else I can make the cards nicer still, from within nanDECK.
Now, let the complaining begin! Actually, I'm not even gonna complain. I'll just phrase my complaints as helpful hints. nanDECK is a Windows IDE for a domain-specific markup/programming language. It runs fine in WINE. The prominently-linked manual is actually a reference guide--tutorials and examples are linked further down the homepage.
The interface features so many buttons that the "visual edit" button might get lost in the shuffle (ha), but that button is going to help you so much. You won't have to remember all the arguments to the language directives, and you can lay out elements visually on the card rather than guess at measurements over and over again. In the end I couldn't get the linked-data feature to work (possibly an interaction with WINE), so I figured out the layout for a single card within nanDECK and then wrote a Python program to generate the nanDECK script for my entire deck.
Whew! Kept it positive. If you want to design cards for a game, and you don't want to lay them all out manually (which you shouldn't), I think nanDECK is your best option. Thanks, Andrea Nini!
Tue Feb 14 2012 09:00 Constellation Games Author Commentary #12: "Monsters From Space":
Welcome to another chapter full of laughter and embarrassing faux
pas. This week we learn why Curic scanned Ariel's house, and get our
first glimpses of the ancient, not-particularly-wise Ip Shkoy.
Before the commentary
begins, I want to bring up something serious that I could save for
next week but I don't want to. Dr. Janice Voss died
on February 6 at 55. She was a scientist, a NASA astronaut who
flew on five shuttle missions, and later the science director for the
Kepler Space Telescope. She was a big science fiction fan. I met her
once in 2007, in what was certainly the highest-wattage dinner
I've ever attended (photos), and she made a huge impression on me.
The only major character in Constellation Games you haven't
met yet is an astronaut, Tammy Miram. She's introduced next
week. If I hadn't met Janice Voss, Tammy Miram would not exist, and I
have no idea what the novel would look like from next week on.
I don't mean that Tammy Miram is "based on" Janice Voss, or that
the character is a way to tell Janice's story in a fictional
setting. I only met Janice Voss once and I have no idea what her
story would look like. (Spoiler) Also, Janice was a very well-adjusted
person, and Tammy is not. But a dinner-length conversation
with Janice was enough to move the societal role of "NASA astronaut"
out of my mental category "archetypes useful in science fiction
stories" and into "interesting jobs I can give to my characters."
R.I.P., Janice Voss. Ad astra per aspera.
Here's last week's Twitter feed, as it was meant to be seen (i.e. without a weird UTF-8 encoding issue). And now, this week's
commentary:
- I've never been happy with this chapter title. Any other
suggestions? It's too late to change it, but I'd like to hear what
you think.
- When I first discused cover art with Kate, I was apprehensive that
she would insist on a Big Dumb Object In Space cover. A shot of Ring
City, or the hole in the moon. The sort of cover that presumably
moves books, since every single science fiction paperback has it, but
one that I think would be entirely out of place in a novel about
middle-class people from Austin. Fortunately, Kate, like me, wanted a
cover that implied "video games", and we settled on the handheld
computer (about which more in a couple weeks). But I had an ace up my
sleeve—a Big Dumb Object cover that I would have liked.
This hypothetical cover is the interior of Alien Ring, huge and
breathtaking, the cma forest curling up in the distance along
with the curvature of the ring, and Ariel in the foreground taking a
picture of it on his cameraphone. It didn't happen, but I could have
lived with it.
Actually, "Big Dumb Object In Space" would have been a better chapter title.
- The Alien Ring stuff got seriously moved around. In the second
draft, Tetsuo and Ashley met Ariel in the docking bay, took him to
Alien Ring, they met Curic there, and everyone went to the moon
together. All of this happened in chapter 11, after Ariel's
initial spaceflight. It was way too much for one chapter, so Alien
Ring got pushed to chapters 12 and 13, and expanded greatly.
My Earth-life analogues for the Aliens were always bonobo chimps,
notorious among humans for their use of sex to maintain social
cohesion. But in "Vanilla" it was more in the background. The primary
Alien character, George, was pretty buttoned-down and never had a scene with
another Alien. For Constellation Games I went all-out and made
the Aliens huge sluts. Good decision!
- In case you're curious, the Earth-life analogue for the Farang is the deadly Snowth.
- Curic's name change (which never comes up again) is a fun detail,
the kind of thing I wish more science fiction stories would mention,
but it's also SYMBOLISM. By the end of the book, all the major
characters except Ariel have had two different names or identities. So far we've seen Tetsuo and Ashley taking human names, Bai going by his
surname, and Curic being Curic.
What does it mean? Nothing—it's
free-floating symbolism. Just kidding, I do have an opinion on what
it means, but it'll need to wait for the end of the book.
- This week's "Finux" moment: the Bit Boy series in this novel is a transparent stand-in for the real-world Mega Man series.
- Finally, Creative License returns with an in-depth
discussion of ports, first seen in this chapter connecting the lunar
excavation to the Ring City habitats above.
A port is the two ends of an exotic-matter wormhole with negative mass. Each
end of the wormhole is mounted in a positive-mass case, and you can (let's say)
carry one end down to the moon to shorten the spacetime distance
between the space station and the moon. Ports can be collapsed from
either end by destabilizing the wormhole.
I invented ports in 2006 for "Vanilla" and in that story I did a
lot of work showing what you could do with them. I felt writers had
generally treated wormholes as magic gateways and neglected their
mayhemic possibilities. I mean, just imagine if the two ends of a
wormhole could be moved independently! You could set up all sorts of
wacky gravity and pressure differentials.
Then in 2007 Portal came out. So, I give up. Ports in the
Constellation universe work just like in Portal, with two
differences. First, you can't shoot wormholes out of guns, because a)
it takes an enormous amount of energy to make one, and b) a wormhole
has two sides. In Portal terms, the "blue" portal has no
existence without the "orange" portal. Second, in Portal, gravity always points down. In the Constellation universe, gravity travels through ports. By proper placement of ports you can create localized weightlessness or antigravity effects.
Anyway, the whole thing is moot, because stable wormholes of this
sort almost certainly can't exist—they'd violate causality and allow for time travel.
The whole thing is merely... Creative License.
Tune in next week, when Curic will say, "Infiltration? Cold reading? Propaganda? Torture? Extracting false confessions?"
Image credits: NASA, NASA again, Kabir Bakie, Alain r.
<- Last week | Next week ->
Thu Feb 16 2012 09:16 Beautiful Soup 4 Beta 6, Beautiful Soup 3.2.1:
There are two ongoing serials here at crummy.com: Constellation Games and Beautiful Soup 4. Here's the announcement message for the latest installment in the latter saga.
The big news is a new release of the 3.x series, Beautiful Soup 3.2.1. This fixes a pretty bad problem that can let through cross-site scripting attacks if you use Beautiful Soup to sanitize HTML. If that's you, you should upgrade ASAP.
That was certainly worth fixing, but I don't do much work on Beautiful Soup 3 anymore. I mean, if I fixed every bug in BS3, I'd have... Beautiful Soup 4, which is now almost done. All the bugs are closed out. There's one more big feature I may add, and some minor cleanup I want to do, but mainly I want to make sure people are comfortable with the new API.
Thanks to Stefano Rivera, BS4 is now in Debian unstable and Ubuntu Pangolin, as beautifulsoup4
. So the clock is ticking on freezing the API. This would be a great time to try to port your BS3 scripts to BS4, and let me know how difficult it was and what you had to change.
(2) Mon Feb 20 2012 07:32 Where's That Golden Age?:
A couple weeks ago Samuel Arbesman posted an entry to Wired's science blog called "How to search for the golden age of television", an entry that's been driving me crazy since I read it. Not because I disagree with his analysis of the IMDB dataset, but because I don't like his starting point. Arbesman uses "each television show’s running time, in number of episodes, as a very rough proxy for quality". It's true that there's probably a positive correlation, but that metric has a couple problems. First, it severely discounts the present. A show on the air today may have several seasons to run, but we don't know that yet, so it'll look worse than an old show of equivalent quality. Second, the IMDB dataset features a much more direct proxy for quality: user ratings.
I don't think ratings are a great proxy for quality--a look at the highest-rated TV shows will put a stop to that nonsense. And the run length of a show is at least an objective fact. But I think our collective opinion of a TV show today is a better proxy of quality than how long the network was originally willing to keep it going. And if you use ratings, I think you can get closer to answering the question "what would a golden age of television look like?"
My guess is, Arbesman didn't use ratings because it's kind of annoying to get that information out of the IMDB dataset. But I'd already done a lot of work on the dataset for The MST3K-IMDB Effect, so in this post I crunch the numbers my way and see what falls out.
If you're expecting controversy, I can't provide. My findings don't contradict Arbesman's, they just provide a different way of looking at the data.
Step 1: Get the data
(If you're impatient, you can skip to the graphs.)
It all starts with IMDB's plain-text data dumps. I downloaded release-dates.list.gz
and ratings.list.gz
from the FTP site. I also downloaded distributors.list.gz
, but it turned out that data wasn't useful.
Step 2: Identify shows, episodes, and air dates
release-dates.list
lists all movies, TV shows, and episodes of TV shows. TV shows are in quotes, and episode names are in curly brackets.
Point Break (1991) USA:12 July 1991
"Star Trek: Voyager" (1995) USA:16 January 1995
"Star Trek: Voyager" (1995) {Caretaker (#1.1)} USA:16 January 1995
Unfortunately, web series look just like TV shows, which is going to mess with the data for recent years:
"The Angry Video Game Nerd" (2006) {A Nightmare on Elm Street (#1.13)} USA:31 October 2006
I tried some tricks to get rid of web series, like only considering shows with a listed television distributor (distributors.list
), but there are tons of dinky cable reality shows that have exactly the same data characteristics as web series. So I'm leaving them in. Just know that when I say "TV shows", I'm talking about TV shows + web series.
To make the initial dataset smaller, I used grep
to remove everything except the US premieres of TV shows, and of episodes of TV shows. (And web series.) Then I wrote a Python script that turns this information into a picklable data structure.
The script ties a show to all of its known episodes, and parses out each episode's release date along with the premiere date of the show itself. I want to know every year in which an episode of the show premiered in the US. This has some problems--it makes the original "Star Trek" show up as a 1988 show because that's the first time the original pilot was aired--but they're pretty minor.
Step 3: Add the ratings
Now I know when every show started, and in many cases I know every year a show was on the air. In the next step I load in another file and add ratings to shows and episodes.
Ratings are kept in ratings.list
. They look like this:
0000001212 11245 7.5 "Star Trek: Voyager" (1995)
0000012111 1558 7.1 "Star Trek: Voyager" (1995) {Caretaker (#1.1)}
There's lots of cool stuff here like a histogram (0000012111 means 10% of people rated the premiere of Voyager a 6, 20% of people a 7, and so on), but what we're after are the IMDB ranks: 7.5 stars and 7.1 stars in this case.
Unfortunately, there's a lot of boring stuff in ratings.list
like the top 250 movies. Fortunately, I already wrote code to parse this file during my investigations into the MST3K-IMDB effect.
Step 4: Graphs!
Now I'm going to break out numpy and pychart. Let me start with a calibration run, a graph Arbesman also did. How many shows were on the air in a given year?
Pretty similar to Arbesman's graph. My graph doesn't go down at the end, because I cut the data off at 2011, the last full year of data. I also start later, with the first year for which there were five rated TV shows. I'm picking up some shows he's not, possibly because I'm counting a show in every year it aired, possibly because I'm picking up shows that don't have any episodes listed on IMDB, possibly because he found some way I didn't think of to exclude web series. But it's a similar shape.
Now here's the graph you've been waiting for: mean rating over time:
It's a sad story of precipitous drops in quality: one between 1959 and 1980, one between 1999 and 2005. By this measure, 2005 was the worst year in television history. If you only looked at mean rating over time, you'd say that there was one golden age of television, from 1955 to 1965, and that the 1980-2000 period was a period of stagnation interrupting an otherwise steady decline.
The graph of median rating over time tells much the same story, so I won't transclude it, but you can follow this link to see it.
But, mean rating isn't the whole story. Let me pull out the only statistics trick I know: look at the standard deviation of the ratings over time.
1959, the year with the highest mean rating, is also a year of extreme homogeneity. Less than one star of difference separates the very good shows from the very bad shows. After 1959, the good shows get better, and the bad shows get worse, relative to the mean. In 1980 the standard deviation was 1.37 stars, and in 2011 it was almost two stars. Remember that ratings are not normally distributed, so two stars is quite a lot. (Even one star, as in 1959, ain't nothing.)
Combine this with the skyrocketing number of shows (which begins in the late 90s and goes into overdrive once we start counting web shows) and you can see how that 2000-2005 decline happened. Over 1300 distinct shows aired in 2005. Of course the mean show is going to be crap! The amazing thing is that things have gotten better since 2005, even as we now make over twice as many shows per year. (And web series! Can't forget those!)
Another factor is that people aren't even bothering to rate the bad shows. Here's the percentage of shows that aired in a given year that don't have IMDB ratings because they haven't gotten enough votes. For 2011, this was a majority of shows!
Old shows aren't rated because nobody remembers them. New shows aren't rated because... well, I did a bunch of spot checks, and they fall into three categories. 1) web series, 2) shows that were never aired and maybe never even produced, 3) crap. Only #3 can properly be considered part of "television". The mean rating would certainly be lower if every show had a rating, but I don't know how much lower.
That's where we stand: television is bad, and it's getting worse. That trend may have been reversed recently, or the decline may have been masked by web shows with passionate fans, or things may have gotten so bad that people stopped even bothering to rate the crap. But! Would you exchange the television of today (mean rating: 6.2) for the television of 1973? (mean rating: 7.3). I wouldn't, and I don't think you would either. What's going on?
Well, we don't watch the mean television show. We only watch the good shows. (If you've read this far, I'm gonna go ahead and make that assumption.) And if you look at the good shows, the picture looks very different.
Here's what the shows look like one standard deviation above the mean. This is basically the top 16% of shows:
At the high end, the decline in quality is reversed in the 80s and early 90s. The gains are undone in the late 90s (2005 is still terrible), but then quality shoots back up. This is very similar to Arbesman's graph of show length over time.
What if you're even more selective? Let's graph the value 1.5 standard deviations above the mean for each year. I don't know what percentile this would correspond to, but it's something like the top 5%. This is the very best stuff you can find on TV in a given year:
This graph, I think, is the best answer to "what would a golden age look like"? It would look like the 60s, when there were three channels under tight quality control, and you could turn on the television at any given time and probably find something good. Or it would look like right now, when a huge number of shows are being produced, and it's easy to be a snob and only watch the very best. This is why we don't remember 2005 as being the worst year of TV in the history of the medium, and this is why I'd never trade today's TV for 1973's TV, even though 1973 looks pretty good on that graph.
So, there you have it--another way of looking at the IMDB data. More to come! Next up: a little thing I like to call "Worst Episode Ever".
Tue Feb 21 2012 09:10 Constellation Games Author Commentary #13: "Your Day Job":
The lucky chapter thirteen introduces the novel's last major character, Mission Specialist Dr. Tammy Miram. She gets right to work, kicking off a subplot that won't be wrapped up til the very last chapter. Let's look at a bunch of commentary, most of which is about her. But first, Twitter archive from last week! Okay, here we go:
Tammy Miram is the only character in the novel who was given a
name with an eye towards its symbolism, i.e. I didn't have a name
handy so I thought "what would Charles Dickens do?". "Miram" is the
Arabic name of the star η Persei. "Tammy" doesn't mean anything in
particular, but it was a popular girls' name in the early 1970s. This
is the Social Security Administration technique for character naming,
and I strongly recommend it.
Oh, but I just looked up the name Tammy and it means "twin". Can't escape the symbolism!
- I'm calling her "Tammy" because I mentioned her name last week in
the tribute to Janice Voss, but Ariel calls her Miss Ion Specialist
throughout this chapter and you don't find out her real name until
next week. In the second draft, Ariel called her "Ion" for the
entire book. You didn't find out her real name until chapter 30.
Why did I change it? Mainly because of the continuous shift away from
the blog format. It didn't make sense for Ariel to be using a blog
pseudonym for Tammy in narration. And it would have been too confusing
for Ariel to constantly switch back and forth between "Ion" and
"Tammy". Especially because I've already got another character
for whom Ariel uses different names in narration vs. blog.
In the third draft, Ariel held on to "Ion" until Part Two. Now he
drops the pseudonym almost immediately, when Jenny calls him on it.
- In the second draft the weightless Ariel/Tammy conversation was
too dialogue-heavy and it was unclear where they were physically in
relation to each other. So I introduced Ariel's frantic attempts to
use the kicker and rotating in ways he doesn't want to rotate,
contrasted with Tammy's mastery of the environment. This lessens the
jokey handwaviness of how Ariel's able to negotiate weightlessness at
all.
Tammy's being from Akron is a reference to my favorite Steven Colbert joke, a question from his interview with Congresswoman Stephanie "Tubbs" Jones:
"Twenty-two astronauts are from Ohio. What is it about your state that
makes people want to flee the Earth?" Also a Devo reference.
- Tammy is the star of the unfinished bonus story "A Princess Of
Mars." Not ruling out finishing that story, but
it'd need to be some kind of currently unplanned tie-in. Like if we did a Kickstarter project to produce a hardcover edition to squeeze a few more dollars out of all you fine customers.
- The ISS backlog is a bunch of shelved experiments that couldn't be
brought up because the shuttle program was cancelled back in the
2000s. They were reassembled in a hurry and most of them don't work
anymore.
- During the writing of the third draft, I decided to alt-history
the International Space Station (in real life a product of the Cold
War) into a post-Glavnaya US/Russian Federation joint, the Space
Science Station. This stupid idea lasted so little time I don't have
any records of it. Beta readers were confused for absolutely no gain
except my own personal satisfaction at having made the world more
self-consistent.
The human space station in this alt-history is still a US/Russian
Federation joint, it's just that a) it's called the International
Space Station, and b) that fact has absolutely no effect on the
story. Problem solved!
Hey, Brendan, here's your one reference to the Cryptids in this
whole novel.
- I really should have pushed the Starfarer release date back
a few years. 1987 is a little late for a pure text adventure, and
Tammy would have been in high school in 1987, so there was plenty of
time to push it back.
- In my opinion the gravity kicker is the single biggest piece of
Creative License in the Constellation universe. Not because
it's technically impossible, but because it breaks the worldbuilding.
All I can do is point it out and/or hope you don't notice.
In "Vanilla", and in the second draft of CG, the device was a
"sonic kicker" and it used reflected sound waves. But in
conversation with physicist Nick Murphy I learned that although a
sonic kicker is technically possible, you'd need to use sound waves so
powerful as to kill on impact. Or something like that. Anyway, I
changed the kicker to use gravity waves, but a small handheld gravity
wave generator is a very valuable piece of tech, on par with
the portable wormholes. And here I had this whole novel where ports
were treated as very valuable tech and the sonic
grav kicker was treated more like a Hammacher Schlemmer
gadget. Instead of rewriting big chunks of the novel to deal with this
point, I now invite you to enjoy a heaping spoonful of... Creative
License.
- Ariel's conversation with Tammy about Cody Wicklund is kind of
obscure, the kind of thing you'd expect to pay off later but it
doesn't. Exactly the kind of thing I can write commentary about!
See, it's supposed to pay off when I rewrite "Vanilla". Cody
Wicklund is that story's POV character, and he's a pretty famous
person in the Constellation universe, so I felt like it would be fun
to mention him in the novel. This was a good place because it's
reasonable that Tammy would know him. But I dunno if I'm actually
going to rewrite "Vanilla", so this is Constellation Games's
equivalent of that teaser caption at the end of Buckaroo
Banzai.
Tammy disliking Cody Wicklund is new. In "Vanilla" he was an
unassuming scientist, not someone you'd have a strong enough opinion
about to dislike—vanilla, in other words. I decided he'd be more interesting if he were more amoral, the kind of person Ariel might compare to Werner von
Braun. Will it pan out? Maybe.
That's all I got. Stay tuned for the huge chapter 14, a chapter full of deepening mysteries and used game trade-ins, the chapter where Ashley finally says, "Ariel was distracted by my beautiful ovipositor."
Image credits: NASA, Mark Phillips, Allen Garvin.
<- Last week | Next week ->
Tue Feb 21 2012 19:33:
Remember when this weblog used to be about fun links? I don't either, but I think it was somewhere in there. Well, check this out: last year when I went to PAX my most enjoyable experience was the panel "Videogames Antiques Roadshow." It worked just like you think: people would bring old game stuff up on stage, and distinguished collectors would estimate the value of the old stuff. Here are some pictures from that panel. In fact, you can see me in the second photo, fourth row center.
Kind of got distracted there--the point of this post is not to look at a crowd scene that includes me. I meant to say that they brought the panel back at PAX Prime, and this time there's video. And it's now called "Retrogaming Roadshow", possibly due to trademark issues. In addition to bringing to light cool bits of history like the PCjr edition of M.U.L.E., I love the way these panels illustrate the social construction of value. Highly recommended if you've got an interest in this stuff.
(6) Wed Feb 22 2012 16:39 Worst Episode Ever:
Time for some more IMDB fun. Last time I looked at whole years of television. This time, I'll graph the ratings for individual episodes of TV shows. Can we watch shows get better or worse over time?
We sort of can. The problem is that only a true fan bothers to go to IMDB and rate individual episodes of a TV show. So you can't really trust the episode ratings--they're too high. But we can visualize trends in show quality, as percieved by the fans.
For these visualizations you want long-running series with lots of die-hard fans. So let's start with Star Trek:
(Note the very last data point in that one. That's the series finale, which everyone hates.)
There's a lot of scatter, but you can generally see the common Star Trek pattern of the show getting better as the ensemble cast comes together. Except for the original series, which ended with a lousy season. Now let's look at another nerd favorite, "Buffy the Vampire Slayer":
Beth requested that one. I've seen exactly one episode of Buffy so I wasn't expecting anything in particular. It looks like a show that's consistently good, but wildly inconsistent within the bounds of "consistently good". It doesn't really get better over time. Maybe the Voyager and DS9 graphs look the same to someone who's not a Trek fan.
But compare "Mystery Science Theater 3000", which gets drastically better over time. When I was younger I would have disputed this finding, but now I basically agree with this graph:
I did a lot more graphs, but I'll just show two more. Here's the graph for "The Simpsons", a very long-running show with a very fickle fan base (see title of this post):
Wow! I love this graph! I don't know enough about the history of the show to name the historical trends, but I'm pretty sure a Simpsons fan will be see a big part of their life history reflected in this graph.
I wanted to see if this sort of coherent shape was just an artifact of the fact that "The Simpsons" has been on the air for over 20 years, so I graphed another long-running show notorious for huge variation in quality, "Saturday Night Live":
You can definitely see where things went wrong, but even within a season there's huge variation in quality. The Simpsons is created by the same people every week, where SNL has two wild cards every week: its guest host and musical guest. And since it's sketch-based, three good or three awful minutes can make or break the entire episode.
Next up, the third and possibly final part of this analysis, in which I'll pit fans of a show against the general public.
PS: For the record, according to IMDB data, the actual worst episode ever of "The Simpsons" was #9.11, "All Singing, All Dancing".
Update: People in comments had questions I can't answer because I only know how to do very basic statistics, but they also had questions about how many people rated the episodes, which I can answer. This table shows how many people have rated each series as a whole, as well as the median and mean numbers of ratings for every episode that has any ratings. I also included how many people rated the first episode, how many rated an episode in the middle, and how many rated the last/most recent episode.
Series | Series ratings | Show ratings (median) | (mean) | (std) | First show | Middle | Most recent |
"Buffy the Vampire Slayer" (1997) | 34564 | 498 | 553.41 | 224.88 | 862 | 511 | 1091 |
"Enterprise" (2001) | 8843 | 140 | 189.27 | 242.28 | 2397 | 130 | 152 |
"Mystery Science Theater 3000" (1988) | 6650 | 57 | 65.54 | 47.41 | 21 | 78 | 131 |
"Saturday Night Live" (1975) | 10151 | 15 | 19.86 | 15.65 | 112 | 11 | 60 |
"Star Trek" (1966) | 12695 | 419 | 480.95 | 222.83 | 668 | 389 | 1923 |
"Star Trek: Deep Space Nine" (1993) | 9779 | 172 | 188.32 | 107.37 | 1501 | 151 | 290 |
"Star Trek: The Next Generation" (1987) | 16974 | 329 | 375.62 | 354.49 | 2189 | 318 | 4580 |
"Star Trek: Voyager" (1995) | 11245 | 153 | 169.08 | 110.96 | 1558 | 177 | 348 |
"The Simpsons" (1989) | 15578 | 319 | 355.07 | 173.09 | 2214 | 309 | 96 |
So SNL actually has very few ratings per episode, while The Simpsons is on par with ST:TNG. It's common for the first episode and the finale to have many more ratings than others. And here's a graph of the number of people who have rated "The Simpsons" over time:

Fri Feb 24 2012 11:17 Beautiful Soup 4 Beta 8:
I didn't even mention beta 7 on NYCB because it was oriented towards getting rid of test failures. Test failures that had a lot to do with what versions of what parsers were installed, but nothing to do with whether or not Beautiful Soup itself was broken.
Beta 8 adds very basic namespace awareness. By "basic" I mean:
- Handle documents that include namespaced tags and attributes without crashing or mangling the document on output.
- If the parser provides namespace information for a tag or attribute, store it for the user's reference instead of discarding it.
That's it. No one responded to my request for namespace-related feature requests, so I'm doing the bare minimum.
(2) Mon Feb 27 2012 12:59 Incorrectly Regarded As Good:
In this third and final part of my IMDB data adventure, I want to switch from graphs to tables, and shed light on the eternal struggle between fans and non-fans. If fans are the ones who care enough to rate individual episodes, non-fans are the ones more likely to rate the show as a whole. I looked at every show that has at least 100 ratings, plus at least 100 rated episodes. I divided the show rating by the mean episode rating to get a "fan appreciation quotient". (I used mean because the show rating itself is a mean, calculated by IMDB.)
Shows with high FA quotients are more beloved by fans than by the general IMDB-using public:
FA quotient | Show | Show rating | Mean episode rating |
1.63 | "Entertainment Tonight" (1981) | 3.7 | 6.0 |
1.34 | "Melrose Place" (1992) | 5.7 | 7.6 |
1.28 | "Dynasty" (1981) | 5.9 | 7.6 |
1.28 | "The Rosie O'Donnell Show" (1996) | 3.6 | 4.6 |
1.26 | "Mighty Morphin' Power Rangers" (1993) | 6.0 | 7.5 |
1.24 | "Full House" (1987) | 6.0 | 7.4 |
1.20 | "Ghost Whisperer" (2005) | 6.4 | 7.7 |
1.20 | "Fear Factor" (2001) | 4.9 | 5.9 |
1.16 | "Dharma & Greg" (1997) | 6.7 | 7.7 |
Note that since this is a quotient, it has nothing to do with the magnitude of the ratings. "The Rosie O'Donnell Show" got terrible ratings even from the people I'm assuming are fans; it's just that the show as a whole did even worse.
OK, smarty pants, what about a low FA quotient? How can a show appeal more to the mainstream than to its own fans? Well, I think a low FA quotient means that a show seems better in retrospect than it actually was. Or, more positively, it means that a show was more than the sum of its parts. Either way, here are the shows with the lowest FA quotients:
FA quotient | Show | Show rating | Mean episode rating |
0.78 | "Bonanza" (1959) | 7.3 | 5.7 |
0.78 | "NYPD Blue" (1993) | 7.7 | 6.0 |
0.77 | "In Living Color" (1990) | 7.9 | 6.1 |
0.75 | "Teenage Mutant Ninja Turtles" (1987/I) | 8.1 | 6.0 |
0.73 | "Gunsmoke" (1955) | 8.0 | 5.8 |
0.71 | "What's My Line?" (1950) | 8.9 | 6.3 |
0.71 | "Saturday Night Live" (1975) | 8.1 | 5.7 |
0.68 | "House of Payne" (2006) | 2.5 | 1.7 |
0.62 | "Ellen: The Ellen DeGeneres Show" (2003) | 7.3 | 4.6 |
0.60 | "MADtv" (1995) | 6.7 | 4.0 |
Look how much sketch comedy there is on that list! I think I'm on to something. Two of my favorite shows, ST:TNG and MST3K, also have low FA quotients of 0.83 and 0.84 respectively.
And right in the middle we have the shows that are exactly as good (or bad) as you remember them:
FA quotient | Show | Show rating | Mean episode rating |
1.00 | "Becker" (1998) | 7.6 | 7.6 |
1.00 | "Cold Case" (2003) | 7.5 | 7.5 |
1.00 | "Dancing with the Stars" (2005/I) | 4.8 | 4.8 |
1.00 | "Hercules: The Legendary Journeys" (1995) | 6.6 | 6.6 |
1.00 | "MacGyver" (1985) | 7.8 | 7.8 |
1.00 | "Mission: Impossible" (1966) | 8.1 | 8.1 |
1.00 | "Project Runway" (2004) | 6.6 | 6.6 |
1.00 | "Rawhide" (1959) | 8.2 | 8.2 |
1.00 | "The Practice" (1997) | 7.7 | 7.7 |
Haters
Similar to the struggle between fans and non-fans is that between fans and antifans, a.k.a. haters. Fans of a show will give it a very high rating, and haters will give it a very low rating. We can detect this by looking for shows whose ratings have high standard deviations.
IMDB doesn't make the standard deviation available directly, but it does provide a ten-character ASCII string that represents the distribution of ratings.
Star Trek: The Next Generation has been rated 16,974 times. Its rating distribution string looks like this: "0000000124". The "4" means that the number of ten-out-of-ten votes is somewhere between 40% (6,790) and 49% (8,316) of those 16,974 votes. The "2" means that between 20% and 29% of the votes are nine-out-of-ten, the "1" means that between 10% and 19% of the ratings are eight-out-of-ten. The zeroes mean that the other star ratings account for between 1% and 9% of ratings each. You can see the conversation about TNG is very heavily dominated by the fans.
I reconstructed the original rating distribution very roughly by treating the character "0" as five percent of the total votes, "1" as fifteen percent, and so on, up to "9" meaning 95 percent of the votes. How rough is the reconstruction? Well, for TNG, the reconstructed distribution has 20,363 data points, where the actual distribution (whatever it is) only has 16,974.
When I take the standard deviation of the reconstructed distribution for ST:TNG, I get 2.74 stars. This particular number is not trustworthy because of the assumptions made in reconstructing the distribution. But by making the same assumptions for every show, we can see which shows are the most divisive. Here are the shows with the largest standard deviations, among all shows with more than 1000 ratings:
Standard deviation | Show | Rating | Votes | Distribution |
---|
3.85 | "Laguna Beach: The Real Orange County" (2004) | 3.7 | 2170 | 3000000003 |
3.76 | "Barney & Friends" (1992) | 3.7 | 1255 | 4000000002 |
3.76 | "Jon & Kate Plus 8" (2007) | 5.4 | 2716 | 2000000004 |
3.76 | "The Hills" (2006) | 3.3 | 5828 | 4000000002 |
3.75 | "Shake It Up!" (2010) | 4.8 | 1013 | 2000000003 |
3.75 | "Paranormal State" (2007) | 4.5 | 1438 | 3000000002 |
3.75 | "Flavor of Love" (2006) | 4.5 | 1254 | 2000000003 |
3.75 | "The Simple Life" (2003) | 3.4 | 2956 | 3000000002 |
3.75 | "The Jerry Springer Show" (1991) | 3.9 | 1631 | 3000000002 |
3.75 | "Jersey Shore" (2009) | 4.5 | 3130 | 3000000002 |
3.75 | "Hannah Montana" (2006) | 3.9 | 1927 | 3000000002 |
3.75 | "Big Brother" (2000/III) | 4.0 | 1621 | 3000000002 |
That list has a bottom, but it's not interesting--it's the shows about whose quality there is general consensus. All right, here it is:
Standard deviation | Show | Rating | Votes | Distribution |
2.38 | "Mork & Mindy" (1978) | 7.0 | 1746 | 0000012211 |
2.38 | "Around the World in 80 Days" (1989/I) | 6.9 | 1446 | 0000012211 |
2.38 | "Amazing Stories" (1985) | 7.3 | 1467 | 0000012211 |
2.38 | "V" (1984) | 7.2 | 2557 | 0000012211 |
2.38 | "Crusade" (1999) | 7.0 | 1133 | 0000012211 |
2.34 | "Impact" (2008) | 5.6 | 1633 | 0000111000 |
2.31 | "Nuremberg" (2000) | 7.2 | 2754 | 0000012311 |
2.22 | "Moby Dick" (1998) | 6.5 | 1967 | 0000112100 |
2.15 | "Golden Years" (1991) | 5.0 | 1459 | 0001211000 |
2.12 | "Covert One: The Hades Factor" (2006) | 5.7 | 1011 | 0000122000 |
2.12 | "The Andromeda Strain" (2008) | 6.1 | 5858 | 0000122100 |
I experimented with a different mapping of the distribution, e.g. saying that "0" meant 2 percent of the votes, "1" meant ten percent, "2" meant 20 percent, and so on. This made the standard deviations into smaller numbers, but it didn't change the ordering of shows very much.
Variability
We can also measure how much a show varies in quality by taking the standard deviation of the ratings given to its episodes. For this I looked at shows which had at least ten episodes that had been rated at least ten times. Here are the results—the "Variability" is the standard deviation of the episode ratings, in IMDB stars.
Variability | Show | Show rating |
---|
3.32 | "The Tonight Show Starring Johnny Carson" (1962) | 8.3 |
2.74 | "The Late Late Show with Craig Ferguson" (2005) | 8.6 |
2.62 | "Jimmy Kimmel Live!" (2003) | 6.4 |
2.60 | "Beauty and the Geek" (2005) | 5.9 |
2.37 | "Late Night with Conan O'Brien" (1993) | 8.5 |
2.23 | "Late Show with David Letterman" (1993) | 6.9 |
2.04 | "Silk Stalkings" (1991) | 6.1 |
1.89 | "The Tonight Show with Jay Leno" (1992) | 5.3 |
1.87 | "Superboy" (1988) | 6.3 |
1.70 | "Duck Dodgers" (2003) | 8.2 |
1.68 | "The Virginian" (1962) | 7.7 |
1.68 | "Ellen: The Ellen DeGeneres Show" (2003) | 7.3 |
There's a lot of late-night talk here. If I loosened the restriction on number of ratings per episode, I also got a lot of soap operas (most of whose episodes have no ratings at all).
And here's the bottom of that list: the most consistently good (or, in theory, bad) shows on TV:
Variability | Show | Show rating |
0.20 | "Day Break" (2006) | 8.3 |
0.20 | "Lucky Louie" (2006) | 8.1 |
0.20 | "Boardwalk Empire" (2010) | 8.9 |
0.20 | "Hung" (2009) | 7.5 |
0.19 | "Outsourced" (2010) | 7.7 |
0.19 | "The Ben Stiller Show" (1992) | 7.3 |
0.18 | "Happy Endings" (2011) | 8.1 |
0.18 | "Lewis" (2007) | 7.9 |
0.08 | "Planet Earth" (2006) | 9.7 |
I looked into the variability of the ratings distribution for individual episodes, hoping to find the most/least controversial TV episodes ever aired, but most of what I found looked like ratings juking. For instance, "Friday Night Lights" and "The Shield" show a hater/fan dynamic on the episode level: some people rating every individual episode very low and others rating
every episode very high.
I think that's enough for now, but I'll come back to the data as I have more ideas, and maybe I'll even learn more than basic statistics for you.
Tue Feb 28 2012 10:04 Constellation Games Author Commentary #14: "The Wave Function Of The Universe":
Damn, the time is flying. Part One ends in three weeks. And today there's a lot of non-commentary stuff I want to talk about, so the commentary itself will be pretty light.
First, I want to tell you that Jeremy Penner implemented Chapter 5's Gatekeeper in HTML5 for the 2012-in-One Glorious Developers Konference Kollection. You can play it online. I wouldn't classify Gatekeeper as fan art, though Jeremy is a fan, because he did it for me as a Kickstarter reward. But either way, it's pretty great!
Second, I want to talk about the process of designing the cover art. You don't have to read the book to "get" the cover—that wouldn't exactly help sales—but the design details are a product of in-world thinking. And at this point you've seen enough of the universe that I can go through that thinking without big spoilers.
The cover is by Chris Sobolowski, who wants me to mention his email address and let y'all know that he's available for graphic design work. So if your contract with Jenny Gallegos fell through due to her being a fictional character, contact Chris, who's a real person.
The process went like this: first, Kate and I laid out a huge number of cover ideas
(some of which I've mentioned in earlier commentaries), and decided we
wanted a cover themed around the ET hardware. At this point Kate got
Chris involved, and Chris came up with a couple sketches that made the
book look like a handheld computer. Here's one of them, next to the cover we ended up using:
I've spent months looking at the finished cover instead of this
first draft, and what strikes me now is how similar they are. But
what struck me at the time was that the computer looks like a piece of military hardware. It's dark and brooding, like one of Batman's
gadgets. I wanted something flashy and colorful, like one of Batman's
gadgets. Or like the Hitchhiker's Guide, to not use the same analogy twice in a row.
But I'm not the artist, and I'm also not a writer who thinks he can do the artist's job. So instead of demanding specific changes I wrote two different in-world histories for this handheld computer, and presented them to Chris.
In one story, the computer was a product of the Dhihe Coastal Coalition, the Farang civilization that produced the Brain Embryo. This explained the military appearance, and it had certain
implications for changes he should make to the design. (E.g. making the buttons much smaller).
In the other story, the one we went with, the computer is an Ip
Shkoy ripoff of a Dhihe design, produced by Perea, the
conglomerate that also put out the game reviewed in this chapter, A Tower of Sand. (The glyphs on the final cover's buttons say "pe" "re" "a".) This has its own implications: the colors are now so bright as to verge on the garish, making the computer look more like a consumer product and making the book look more like a comedy and less like a technothriller.
In this story, the only remaining Farang detail is the Brain Embryo-esque mother-of-pearl finish. Stylistically it's reminiscent of the wood grain on an Atari 2600, but it tells a different story. When you were a kid, electricity was an advanced technology. Then all these space aliens showed up handing out blueprints for handheld computers. You want something that looks as different as possible from the wooden toys you had when you were young.
The cocktail cabinet-like second set of controls at the top comes
from this bit I wrote about the computer's social context:
Why would the notoriously social Ip Shkoy build a
single-user game system? It probably has something to do with
sex. Imagine this portable computer as a product for the swinging
bachelor, full of "sophisticated" adult games to break the ice,
contact management applications to replace one's little black book,
and a vibrator peripheral for when the night's inevitable failure
leaves you alone in your crappy apartment.
This device would need to have some two-person controls, so
that you can play those icebreaker games with your would-be conquest,
but the overall feel would be that this is my computer, but I
might let you use it.
Chris took the Ip Shkoy story and produced something that's very close to the final cover. Here's another side-by-side comparison:
After that, there was a lot of back and forth on trivial details like how much and what kinds of wear should be visible on the computer. Around this time Adam was designing the Pey Shkoy language for Tetsuo's Twitter feed, so I asked him to also design a script for use on the cover. This is also the point where Kate got the idea for a "Berlitz Traveler's Lexicon," which became "Pey Shkoy Benefits Humans."
I haven't mentioned the back cover, but at this point I think I've reached or exceeded the limit on how long this discussion can be without getting dull, so let's move on to chapter 14 commentary. But not before linking to the archive of last week's Twitter fun.
OK, that's plenty for this week. Next week: IT BEGINS. Oh, and Curic says, "Silence, puny human!"
Image credits: Jeremy Penner, Chris Sobolowski, NASA
<- Last week | Next week ->
Tue Feb 28 2012 10:51 Beautiful Soup 4 Beta 9:
The latest beta is the first one I'm calling a release candidate, so if you've been waiting to try it out, now's your chance.
[Main]  | Unless otherwise noted, all content licensed by Leonard Richardson under a Creative Commons License. |