<D <M <Y
Y> M> D>

[Comments] (10) Beautiful Soup 3.0 Beta: For your delectability. The major new feature is that Beautiful Soup 3.0 takes XML or HTML documents in any encoding and turns them into UTF-8; in most cases you don't have to know the current encoding. I wrote this without really knowing anything about encodings: most of the code is stolen from Mark Pilgrim's Universal Feed Parser. But I am able to write tests, and the tests work.

The other major new feature is that you can now rip out a chunk from the parse tree with the extract method. You can use the chunk and abandon the rest of the tree, or vice versa. This is especially useful because the data structures you abandoned can now be garbage-collected: in current Beautiful Soup, the whole tree stays in memory forever because every Tag and NavigableText is connected to every other Tag and NavigableText through an intricate web of lies. And by "lies", I mean "instance variables".

There are some more new features, but I have to take a shower now to go and meet Pete Peterson II for dinner. Test it out; I'll be rewriting the documentation over the next month or so, and hopefully by then I'll have gotten enough feedback to release it.

Just When You Thought It Was Safe To Not Look At Cute Baby Elephant Pictures: Cute baby elephant picture!


[Main]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.