What's New in Beautiful Soup

Release 3.1.0.1 (2009/01/06)

Fixed a small but annoying bug that caused BS to crash when presented with HTML that contained boolean attributes.

Release 3.1.0 (2008/12/27)

A hybrid version that supports 2.4 and can be automatically converted to run under Python 3.0. There are three backwards-incompatible changes you should be aware of, but no new features or deliberate behavior changes.

  1. str() may no longer do what you want. This is because the meaning of str() inverts between Python 2 and 3; in Python 2 it gives you a byte string, in Python 3 it gives you a Unicode string.

    The effect of this is that you can't pass an encoding to .__str__ anymore. Use encode() to get a string and decode() to get Unicode, and you'll be ready (well, readier) for Python 3.

  2. Beautiful Soup is now based on HTMLParser rather than SGMLParser, which is gone in Python 3. There's some bad HTML that SGMLParser handled but HTMLParser doesn't, usually to do with attribute values that aren't closed or have brackets inside them:
      <a href="foo</a>, </a><a href="bar">baz</a>
      <a b="<a>">', '<a b="<a>"></a><a>"></a>
    

    A later version of Beautiful Soup will allow you to plug in different parsers to make tradeoffs between speed and the ability to handle bad HTML.

  3. In Python 3 (but not Python 2),HTMLParser converts entities within attributes to the corresponding Unicode characters. In Python 2 it's possible to parse this string and leave the &eacute; intact.
    <a href="http://crummy.com?sacr&eacute;&bleu">
    

    In Python 3, the &eacute; is always converted to \xe9 during parsing.

Release 3.0.7a (2008/07/03)

Release 3.0.7 (2008/06/22)

Release 3.0.6 (2008/04/26)

Release 3.0.5 (2007/12/12)

Release 3.0.4 (2007/04/10)

Release 3.0.3 (2006/06/06)

Release 3.0.2 (2006/06/02)

Release 3.0.1 (2006/05/30)

Release 3.0.0 (2006/05/28), "Who would not give all else for two p"

Release 2.1.1 (2005/09/18)

Release 2.1.0, "Game, or any other dish?" (2005/05/04)

Release 2.0.3 (2005/05/01)

Release 2.0.2 (2005/04/16)

Release 2.0.1 (2005/04/12)

Release 2.0, "Who cares for fish?" (2005/04/10)

Beautiful Soup version 1 was very useful but also pretty stupid. I originally wrote it without noticing any of the problems inherent in trying to build a parse tree out of ambiguous HTML tags. This version solves all of those problems to my satisfaction. It also adds many new clever things to make up for the removal of the stupid things.

Parsing

Strings and Unicode

Tree traversal

Tree manipulation

Porting Considerations

There are three changes in 2.0 that break old code:

Between 1.2 and 2.0

This is the release to get if you want Python 1.5 compatibility.

Release 1.2, "Who for such dainties would not stoop?" (2004/07/08)

Release 1.1, "Swimming in a hot tureen"

Release 1.0, "So rich and green" (2005/04/20)

Initial release.


This document (source) is part of Crummy, the webspace of Leonard Richardson (contact information). It was last modified on Tuesday, January 06 2009, 21:18:23 Nowhere Standard Time and last built on Saturday, February 04 2012, 02:00:07 Nowhere Standard Time.

Crummy is © 1996-2012 Leonard Richardson. Unless otherwise noted, all text licensed under a Creative Commons License.

Document tree:

http://www.crummy.com/
software/
BeautifulSoup/
CHANGELOG.html
Site Search: