<M <Y
Y> M>

[Comments] (2) March Film Roundup: Okay, look. I don't see movies just for their entertainment value. I dig film as an art form. But my permit to dig is premised on an amateur understanding of film as a narrative art form. If you want to present an endless stream of disconnected images, let's do an installation piece, because I want to decide for myself when I've had enough. I'm not going to be your captive for fifty minutes. (I'm looking at you, Andy Warhol.) And all that aside, I'm not gonna see a movie called Trash Humpers (2009), when the nicest thing the folks doing the screening can say is that it "rewards the open-minded viewer with moments of astonishing and unexpected poignancy."

Which is to say that I skipped most of the museum's highly avant-garde March offerings. I also got this book I have to work on. So not many movies in this roundup. Let's-a go:

In Search of the Beautiful Soup Double-Dippers: Recently I noticed that certain IPs were using distribute or setuptools to download the Beautiful Soup tarball multiple times in a row. For one thing, I'm not sure why distribute and setuptools are downloading Beautiful Soup from crummy.com instead of using PyPI, especially since PyPI registers almost 150k downloads of the latest BS4--why are some people using PyPI and not others?

If anyone knows how to convince everyone to use PyPI, I'd appreciate the knowledge. But it's not a big deal right now, and it gives me some visibility into how people are using Beautiful Soup. Visibility which I will share with you.

Yesterday, the 17th, the Beautiful Soup 4.1.3 tarball was downloaded 2223 times. It is by far the most popular thing on crummy.com. The second most popular thing is the Beautiful Soup 3.2.1 tarball, which was downloaded 381 times. The vast majority of the downloads were from installation scripts: distribute or setuptools.

1516 distinct IP addresses were responsible for the 2223 downloads of 4.1.3. I wrote a script to find out how many IP addresses downloaded Beautiful Soup more than once. The results:

Downloads from a single IP Number of times this happened

Naturally my attention was drawn to the outliers at the top of the table. I investigated them individually. The IP address responsible for 55 downloads is a software company of the sort that might be deploying to a bunch of computers behind a proxy. The 35 is an individual on a cable modem who, judging from their other traces on the Internet, is deploying to a bunch of computers using Puppet. The 15, the 13, and the 11 are all from Travis CI, a continuous integration service.

One of the two 5s was an Amazon EC2 instance. Five of the twelve 4s were Amazon EC2 instances. Thirty-seven of the forty-three 3s were Amazon EC2 instances. And 395 of the 453 double-dippers were Amazon EC2 instances. Something's clearly going on with EC2. (There was also one download from within Amazon corporate, among other BigCo downloaders.)

I hypothesized that the overall majority of duplicate requests are from Amazon EC2 instances being wiped and redeployed. To test this hypothesis I went through all the double-dippers and calculated the time between the first request and the second. My results are in this scatter plot. Each point on the plot represents an IP address that downloaded Beautiful Soup twice yesterday.

For EC2 instances, the median time between requests is 11 hours and 45 minutes. So EC2 instances are being automatically redeployed twice a day. For non-EC2 instances, the median time between requests is 51 minutes, and the modal time is about zero. Those people set up a dev environment, discover that something doesn't work, and try it again from scratch.

Board Game Dadaist Improvements: I've finally relented to Adam's demands and made some improvements to the Board Game Dadaist RSS feed. He broke his kneecap recently and I figured this would be a good way to cheer him up. Every game that shows up in the feed now has a permalink (here's "Plue"), and that page has a very basic link for posting your find to Twitter.


Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.