< Leonard Nitpicks The Christmas Songs
Next >

Beautiful Soup Progress #2: Another glorious vacation day squandered porting Beautiful Soup to Python 3 for you ungrateful sods! I have a script that runs 2to3-3.0 on the core codebase and applies a little patch of my own, and I've used it to fix almost all of the Unicode problems. We've still got some kind of problem with the search mechanism, and some problems with HTMLParser (?) differences involving how HTML entities and self-closing tags are handled between Python 2 and Python 3. I'm down to 15 failing tests in the converted code, without breaking any tests in the Python 2 version.

I think a couple people were confused by my earlier statement that you'd be able to "write a plugin for lxml [or] lib5html." I'm talking about using another parser to drive Beautiful Soup tree generation. Turning events generated by some other parser into a generic set of "start tag", "end tag" type events. Thus giving you an alternative to the okay-for-2004-but-not-for-2008 Beautiful Soup rules about parsing bad HTML, and eventually getting rid of those rules altogether, because I don't want to be in that business.

Filed under:

[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.