< Disturbing Spam Subjects
Next >

[Comments] (5) I Have Arrived: Sam Ruby and Mark Pilgrim endorse Beautiful Soup.

Filed under:

Comments:

Posted by Sam Ruby at Thu Jun 01 2006 09:20

Endorse? More like kicking the tires... ;-)

I'm puzzled by the output of the following (note: I replaced angle brackets by curly braces as your weblog does not have a preview function)

echo "a{br}b" python BeautifulSoup.py

Posted by Sam Ruby at Thu Jun 01 2006 09:22

Very odd. Didn't expect the vertical bar to be eaten.

Posted by Leonard at Thu Jun 01 2006 09:25

Kick away.

The Beautiful Soup default is supposedly to act as an HTML pretty-printer, but it's actually acting like an XML pretty-printer. I don't know why I did that.

Change the second-to-last line from BeautifulStoneSoup to BeautifulSoup, and it will start knowing about HTML's self-closing tags. I'll make this change in the next version.

Posted by Sam Ruby at Fri Jun 02 2006 07:29

Here's a test case for you. If you can get it to pass, I'll look at replacing sanitize with BeautifulSoup in Planet.

import BeautifulSoup, urllib
from xml.dom import minidom

page = urllib.urlopen('http://www.whump.com/moreLikeThis/').read()
minidom.parseString(str(BeautifulSoup.BeautifulSoup(page)))

Posted by Leonard at Fri Jun 02 2006 11:17

Got it working. I'll email you with details once I get the new release out.


[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.