News You Can Bruise for 2019 July

Mon Jul 01 2019 21:22 June Film Roundup:

Booksmart (2019): We were pre-sold on this by the screenplay credit for Sarah Haskins, who did a hilarious regular segment called "Target Women" about ten years ago. Booksmart is really funny, but it's also got a dramatic arc that you don't see very often. I'll go into more detail at the end of this review, but I do think you should see this movie and that it's more fun to see this character arc happen than to read about it.
This film reminded us of Brick, another very stylish movie that shows high school through the subjective experience of the students. Maybe you don't think this movie is stylish, but it totally is: every character has a carefully maintained self-image that's within their budget and the movie's budget. It's just that most of the characters are also huge dorks.
Judging from the street address, one of the party houses is just down the block from one of the places I lived in LA as a kid. The neighborhood really has changed.
OK, here's what I mean about the character arc. At first it seems like Booksmart has cookie-cutter high school movie villains. Then it turns out that no, this is like Clueless and there is no villain. Then, no, this is like Inside Out and the protagonist is the villain. Then, no, there really was no villain, these are all just teenagers making teenager mistakes.
Face/Off (1997): Sometimes people say a movie is "so bad it's good". I've said it myself, and I generally mean a bad movie was also entertaining. Face/Off is more complicated: it's a movie that's bad and good simultaneously and for the same reasons. Casting the oddest actors of '90s Hollywood in both lead roles? Seems like a bad idea! But Face/Off turns it into an advantage by making Travolta and Cage effectively play each other's stock characters. Whenever either one of them is on screen (the entire movie, effectively), you get the ACTING power of both.
Face/Off doesn't just do a good job of recreating a bad movie—Mars Attacks! tried that, and the resulting movie was simply bad. It mixes up the ingredients of a bad movie in an inventive way, creating something special. Like a Five Obstructions kind of deal.
Born Bone Born (2018): That's the correct English name of this movie and I have the ticket stub to prove it. IMDB has the wrong title (Bone Born Bone). The name sure is confusing, though the ending sorta gives you a mnemonic. It's not a literal translation, which means they could have eliminated all possibility of confusion with a super-verbose title like "Mom Died, But We'll See Her Again At The Family Reunion? Two Worlds Meet On Okinawa!"
Anyhow, this is a really excellent family dramedy that got much bigger theater laughs than a family dramedy usually does. It's got fun characters, great timing, and it does a good job of putting the audience in the anxious ready-to-laugh state with its up-front treatment of death.
Set It Off (1996): This had been on my list for a while and it was nice to see it on the big screen with Sumana. I came in expecting an Oceans-style heist, but I got a quality modern noir. The Thelma and Louise-esque authority figure is a little more bearable than in Thelma and Louise, but who goes to these movies to watch the cop?

Addendum: After last month's The Bit Player experiment, I've found that Film Roundup is the best place to list interesting films that I can't put on a wishlist because they're not yet products you can wishlist. This month's entry: Dance with Me, the tragedy (?) of a woman who's cursed to live in a musical. It's showing at the Japan Cuts festival later this month, but I was slow on the draw and all the tickets sold out. We'll see it later... and I'll see you later!

Sun Jul 21 2019 12:16 Beautiful Soup 4.8.0: I'm getting back into the swing of putting up a NYCB post when I complete a project. Yesterday I published a feature release of Beautiful Soup, 4.8.0. This release makes it easy to make fine-grained customizations to the input mechanism (the TreeBuilder class) and the output mechanism (the Formatter class).

This makes it easy to do things like change the rules about which attributes are treated as multi-value attributes. If you don't like how Beautiful Soup parses class into a list of CSS classes, this is the release for you. It's not a huge release, but this project's now fifteen years old so I'm relieved at how stable it's been.

Speaking of CSS, although this is a feature release, it's a little smaller than the 4.7.0 release I put out at the end of 2018. That one took out the lackluster implementation of CSS selectors, based on Simon Willison's "soupselect" project from the early 2010s. I replaced it with a dependency on Isaac Muse's SoupSieve project, which has a nearly complete CSS selector implementation. The old implementation was a common cause of complaints, but—like the HTML5 parsing algorithm—it's not something I have a strong interest in and I'm happy to give the whole job to an external dependency.

There was a period of about a year in 2017-2018 when I wasn't interested in doing Beautiful Soup work, but Tidelift changed that. Tidelift gathers subscription money from companies that rely on free software, and distributes the money to the developers in exchange for a level of support that I find sustainable.

Nobody builds an entire product around Beautiful Soup (or at least nobody will admit do doing this), but thousands of people have used Beautiful Soup to save time at their day jobs. Bundling Beautiful Soup together with bigger projects like Flask and numpy is a solution that works really well for me.

(15) Mon Jul 22 2019 08:39 Secretly Public Domain: "Fun facts" are, sadly, often less than fun. But here's a genuinely fun fact: most books published in the US before 1964 are in the public domain! Back then, you had to send in a form to get a second 28-year copyright term, and most people didn't bother.

This is how Project Gutenberg is able to publish all these science fiction stories from the 50s and 60s. Those stories were published in issues of magazines that didn't send in the renewal form. But up til now this hasn't been a big factor, because 1) the big publishers generally made sure to send in their renewals, and 2) it's been impossible to check renewal status in bulk.

Up through the 1970s, the Library of Congress published a huge series of books listing all the registrations and the renewals. All these tomes have been scanned -- Internet Archive has the registration books—but only the renewal information was machine-readable. Checking renewal status for a given book was a tedious job, involving flipping back and forth between a bunch of books in a federal depository library or, more recently, a bunch of browser tabs. Checking the status for all books was impossible, because the list of registrations was not machine-readable.

But! A recent NYPL project has paid for the already-digitized registration records to be marked up as XML. (I was not involved, BTW, apart from saying "yes, this would work" four years ago.) Now for anything that's unambiguously a "book", we have a parseable record of its pre-1964 interactions with the Copyright Office: the initial registration and any potential renewal.

The two datasets are in different formats, but a little elbow grease will mesh them up. It turns out that eighty percent of 1924-1963 books never had their copyright renewed. More importantly, with a couple caveats about foreign publication and such, we now know which 80%.

This was announced back in May, but I don't think it got the attention it deserved. This is a really big deal, so I had no choice but to create a bot. Here's Secretly Public Domain, which highlights unrenewed works that have already been scanned for Hathi Trust. This only represents 10% of the 80%, but it's the ten percent most likely to be interesting, and these books have the easiest path towards being available online.

August 9 update: topline number is closer to 73%, next steps for the public domain books, and how to get the data on your own computer.