< June Film Roundup
Secretly Public Domain >

Beautiful Soup 4.8.0: I'm getting back into the swing of putting up a NYCB post when I complete a project. Yesterday I published a feature release of Beautiful Soup, 4.8.0. This release makes it easy to make fine-grained customizations to the input mechanism (the TreeBuilder class) and the output mechanism (the Formatter class).

This makes it easy to do things like change the rules about which attributes are treated as multi-value attributes. If you don't like how Beautiful Soup parses class into a list of CSS classes, this is the release for you. It's not a huge release, but this project's now fifteen years old so I'm relieved at how stable it's been.

Speaking of CSS, although this is a feature release, it's a little smaller than the 4.7.0 release I put out at the end of 2018. That one took out the lackluster implementation of CSS selectors, based on Simon Willison's "soupselect" project from the early 2010s. I replaced it with a dependency on Isaac Muse's SoupSieve project, which has a nearly complete CSS selector implementation. The old implementation was a common cause of complaints, but—like the HTML5 parsing algorithm—it's not something I have a strong interest in and I'm happy to give the whole job to an external dependency.

There was a period of about a year in 2017-2018 when I wasn't interested in doing Beautiful Soup work, but Tidelift changed that. Tidelift gathers subscription money from companies that rely on free software, and distributes the money to the developers in exchange for a level of support that I find sustainable.

Nobody builds an entire product around Beautiful Soup (or at least nobody will admit do doing this), but thousands of people have used Beautiful Soup to save time at their day jobs. Bundling Beautiful Soup together with bigger projects like Flask and numpy is a solution that works really well for me.

Filed under:


Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.