Filed under: beautifulsoup
Very odd. Didn't expect the vertical bar to be eaten.
Posted by Leonard at Thu Jun 01 2006 09:25
Kick away.The Beautiful Soup default is supposedly to act as an HTML pretty-printer, but it's actually acting like an XML pretty-printer. I don't know why I did that. Change the second-to-last line from BeautifulStoneSoup to BeautifulSoup, and it will start knowing about HTML's self-closing tags. I'll make this change in the next version.
Here's a test case for you. If you can get it to pass, I'll look at replacing sanitize with BeautifulSoup in Planet.import BeautifulSoup, urllib
from xml.dom import minidompage = urllib.urlopen('http://www.whump.com/moreLikeThis/').read()
Posted by Leonard at Fri Jun 02 2006 11:17
Got it working. I'll email you with details once I get the new release out.