< Constellation Games Author Commentary #16: "False Daylight"
The Pitch! >

Beautiful Soup 4.0.1: It's been nearly two weeks since the release of the last BS4 beta, and no one has reported problems with the code. I'm sure there are still problems, but at this point the best way to find them is to do an official release. So, I present the first full release of Beautiful Soup 4, 4.0.1![0]

If you're just tuning in, Beautiful Soup 4 is nearly a complete rewrite that works on Python 2 and Python 3. Instead of a custom-built parser from 2006, Beautiful Soup 4 sits on top of lxml (for speed) or html5lib (for browser-like parsing) or the built-in HTMLParser (for convenience). Methods and attributes are renamed for PEP 8 compliance, and Beautiful Soup 4 incorporates the soupselect project to provide basic CSS selector support. I completely rewrote the documentation, Beautiful Soup's secret weapon since 3.0, for clarity and completeness.

That's the major stuff. Even though most of the code has changed, my goal was not to add a bunch more features, but to make sure Beautiful Soup will still be usable and useful years into the future.

Beautiful Soup 4 is mostly but not entirely backwards compatible with Beautiful Soup 3. Most users should be able to switch from 3 to 4 just by changing an import line. In the Python tradition of sticking a number on the end of your module name when you break backwards compatibility, I've released it as a separate package, beautifulsoup4.

This release also inaugurates the Beautiful Soup Hall of Fame, featuring the uses of Beautiful Soup that I personally find the coolest or highest-profile.

So, try out Beautiful Soup 4 the next time you need to do some screen-scraping. If you've used Beautiful Soup 3, I think you'll be pleasantly surprised. If not, I'll just say I hope you like it.

I've thanked them before, but special thanks are once again due to Thomas Kluyver and Ezio Melotti for helping me get everything working under Python 3.

[0] The first release is called 4.0.1 instead of 4.0.0 because I've been bitten by clever packagers before and I don't want them thinking "4.0.0" is an earlier version than "4.0.0b10".

Filed under:

[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.