< Reviews of Old Science Fiction Magazines: Analog 1985/07
June Film Roundup >

[Comments] (2) Beautiful Soup 4.4.0 beta: I've found an agent for Situation Normal and the book is out to publishers and I don't have to think about it for a while. As seems to be my tradition after finishing a big project, I went through the accumulated Beautiful Soup backlog and closed it out. I've put out a beta release which I'd like you to try out and report any problems.

I've fixed 17 bugs, added some minor new features, and changed the implementations of __copy__ and __repr__ to work more like you'd expect from Python objects. But in my mind the major new change is this: I've added a warning that displays when you create a BeautifulSoup object without explicitly specifying a parser:

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "lxml")

It's a little annoying to get this message, but it's also annoying to have your code silently behave differently because you copied it to a machine that didn't have lxml installed, and it's also annoying when I have to check pretty much every reported bug to see whether this is the problem. Whenever I think I can eliminate a class of support question with a warning, I put in the warning. It saves everybody time.

The other possibility: now that Python's built-in HTMLParser is decent, I could make it so that it's always the default unless you specify another parser. This would cause a big one-time wrench, as even machines which have lxml installed would start using HTMLParser, but once it shook out the problem would be solved. I might still do that, but I think I'll give everyone about a year to get rid of this annoying warning.

Anyway, try out the beta. Unless there's a big problem I'll be releasing 4.4.0 on Friday.

Filed under:

Comments:

Posted by Brendan at Mon Jun 29 2015 12:00

Buried lede in a shallow grave.

Posted by Danny at Wed Jul 08 2015 14:24

This error message looks different than what I'm getting (using beautifulsoup4, version: 4.4.0)

To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "lxml-xml")

markup_type=markup_type))

The error message is confusing - where do I assign markup_type? I tried specifying the markup_type as an argument, but then I get a keyword argument error.

My code looks like this:
soup = BeautifulSoup(open(final), ["lxml-xml","xml"])


[Main]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.