< Beautiful Soup 4 Beta 6, Beautiful Soup 3.2.1
Constellation Games Author Commentary #13: "Your Day Job" >

[Comments] (2) Where's That Golden Age?: A couple weeks ago Samuel Arbesman posted an entry to Wired's science blog called "How to search for the golden age of television", an entry that's been driving me crazy since I read it. Not because I disagree with his analysis of the IMDB dataset, but because I don't like his starting point. Arbesman uses "each television show’s running time, in number of episodes, as a very rough proxy for quality". It's true that there's probably a positive correlation, but that metric has a couple problems. First, it severely discounts the present. A show on the air today may have several seasons to run, but we don't know that yet, so it'll look worse than an old show of equivalent quality. Second, the IMDB dataset features a much more direct proxy for quality: user ratings.

I don't think ratings are a great proxy for quality--a look at the highest-rated TV shows will put a stop to that nonsense. And the run length of a show is at least an objective fact. But I think our collective opinion of a TV show today is a better proxy of quality than how long the network was originally willing to keep it going. And if you use ratings, I think you can get closer to answering the question "what would a golden age of television look like?"

My guess is, Arbesman didn't use ratings because it's kind of annoying to get that information out of the IMDB dataset. But I'd already done a lot of work on the dataset for The MST3K-IMDB Effect, so in this post I crunch the numbers my way and see what falls out.

If you're expecting controversy, I can't provide. My findings don't contradict Arbesman's, they just provide a different way of looking at the data.

Step 1: Get the data

(If you're impatient, you can skip to the graphs.)

It all starts with IMDB's plain-text data dumps. I downloaded release-dates.list.gz and ratings.list.gz from the FTP site. I also downloaded distributors.list.gz, but it turned out that data wasn't useful.

Step 2: Identify shows, episodes, and air dates

release-dates.list lists all movies, TV shows, and episodes of TV shows. TV shows are in quotes, and episode names are in curly brackets.

Point Break (1991)					USA:12 July 1991
"Star Trek: Voyager" (1995)				USA:16 January 1995
"Star Trek: Voyager" (1995) {Caretaker (#1.1)}		USA:16 January 1995

Unfortunately, web series look just like TV shows, which is going to mess with the data for recent years:

"The Angry Video Game Nerd" (2006) {A Nightmare on Elm Street (#1.13)}	USA:31 October 2006

I tried some tricks to get rid of web series, like only considering shows with a listed television distributor (distributors.list), but there are tons of dinky cable reality shows that have exactly the same data characteristics as web series. So I'm leaving them in. Just know that when I say "TV shows", I'm talking about TV shows + web series.

To make the initial dataset smaller, I used grep to remove everything except the US premieres of TV shows, and of episodes of TV shows. (And web series.) Then I wrote a Python script that turns this information into a picklable data structure.

The script ties a show to all of its known episodes, and parses out each episode's release date along with the premiere date of the show itself. I want to know every year in which an episode of the show premiered in the US. This has some problems--it makes the original "Star Trek" show up as a 1988 show because that's the first time the original pilot was aired--but they're pretty minor.

Step 3: Add the ratings

Now I know when every show started, and in many cases I know every year a show was on the air. In the next step I load in another file and add ratings to shows and episodes.

Ratings are kept in ratings.list. They look like this:

      0000001212   11245   7.5  "Star Trek: Voyager" (1995)
      0000012111    1558   7.1  "Star Trek: Voyager" (1995) {Caretaker (#1.1)}

There's lots of cool stuff here like a histogram (0000012111 means 10% of people rated the premiere of Voyager a 6, 20% of people a 7, and so on), but what we're after are the IMDB ranks: 7.5 stars and 7.1 stars in this case.

Unfortunately, there's a lot of boring stuff in ratings.list like the top 250 movies. Fortunately, I already wrote code to parse this file during my investigations into the MST3K-IMDB effect.

Step 4: Graphs!

Now I'm going to break out numpy and pychart. Let me start with a calibration run, a graph Arbesman also did. How many shows were on the air in a given year?

Pretty similar to Arbesman's graph. My graph doesn't go down at the end, because I cut the data off at 2011, the last full year of data. I also start later, with the first year for which there were five rated TV shows. I'm picking up some shows he's not, possibly because I'm counting a show in every year it aired, possibly because I'm picking up shows that don't have any episodes listed on IMDB, possibly because he found some way I didn't think of to exclude web series. But it's a similar shape.

Now here's the graph you've been waiting for: mean rating over time:

It's a sad story of precipitous drops in quality: one between 1959 and 1980, one between 1999 and 2005. By this measure, 2005 was the worst year in television history. If you only looked at mean rating over time, you'd say that there was one golden age of television, from 1955 to 1965, and that the 1980-2000 period was a period of stagnation interrupting an otherwise steady decline.

The graph of median rating over time tells much the same story, so I won't transclude it, but you can follow this link to see it.

But, mean rating isn't the whole story. Let me pull out the only statistics trick I know: look at the standard deviation of the ratings over time.

1959, the year with the highest mean rating, is also a year of extreme homogeneity. Less than one star of difference separates the very good shows from the very bad shows. After 1959, the good shows get better, and the bad shows get worse, relative to the mean. In 1980 the standard deviation was 1.37 stars, and in 2011 it was almost two stars. Remember that ratings are not normally distributed, so two stars is quite a lot. (Even one star, as in 1959, ain't nothing.)

Combine this with the skyrocketing number of shows (which begins in the late 90s and goes into overdrive once we start counting web shows) and you can see how that 2000-2005 decline happened. Over 1300 distinct shows aired in 2005. Of course the mean show is going to be crap! The amazing thing is that things have gotten better since 2005, even as we now make over twice as many shows per year. (And web series! Can't forget those!)

Another factor is that people aren't even bothering to rate the bad shows. Here's the percentage of shows that aired in a given year that don't have IMDB ratings because they haven't gotten enough votes. For 2011, this was a majority of shows!

Old shows aren't rated because nobody remembers them. New shows aren't rated because... well, I did a bunch of spot checks, and they fall into three categories. 1) web series, 2) shows that were never aired and maybe never even produced, 3) crap. Only #3 can properly be considered part of "television". The mean rating would certainly be lower if every show had a rating, but I don't know how much lower.

That's where we stand: television is bad, and it's getting worse. That trend may have been reversed recently, or the decline may have been masked by web shows with passionate fans, or things may have gotten so bad that people stopped even bothering to rate the crap. But! Would you exchange the television of today (mean rating: 6.2) for the television of 1973? (mean rating: 7.3). I wouldn't, and I don't think you would either. What's going on?

Well, we don't watch the mean television show. We only watch the good shows. (If you've read this far, I'm gonna go ahead and make that assumption.) And if you look at the good shows, the picture looks very different.

Here's what the shows look like one standard deviation above the mean. This is basically the top 16% of shows:

At the high end, the decline in quality is reversed in the 80s and early 90s. The gains are undone in the late 90s (2005 is still terrible), but then quality shoots back up. This is very similar to Arbesman's graph of show length over time.

What if you're even more selective? Let's graph the value 1.5 standard deviations above the mean for each year. I don't know what percentile this would correspond to, but it's something like the top 5%. This is the very best stuff you can find on TV in a given year:

This graph, I think, is the best answer to "what would a golden age look like"? It would look like the 60s, when there were three channels under tight quality control, and you could turn on the television at any given time and probably find something good. Or it would look like right now, when a huge number of shows are being produced, and it's easy to be a snob and only watch the very best. This is why we don't remember 2005 as being the worst year of TV in the history of the medium, and this is why I'd never trade today's TV for 1973's TV, even though 1973 looks pretty good on that graph.

So, there you have it--another way of looking at the IMDB data. More to come! Next up: a little thing I like to call "Worst Episode Ever".

Filed under:


Posted by kirkjerk at Mon Feb 20 2012 09:48

Doesn't Arbesmans' assumptions imply Soap Operas and Daytime Game Shows are THE BEST THINGS EVER?

Posted by Leonard at Mon Feb 20 2012 14:32

It would if you used that method to find the top 10 shows of all time. But he's graphing the median number of episodes, so huge outliers don't distort the data--they still only count as one show.

[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.