(5) Wed May 31 2006 23:08 I Have Arrived:
Sam Ruby and Mark Pilgrim endorse Beautiful Soup.
Filed under: beautifulsoup
- Comments:
Posted by Sam Ruby at Thu Jun 01 2006 09:20
Endorse? More like kicking the tires... ;-)I'm puzzled by the output of the following (note: I replaced angle brackets by curly braces as your weblog does not have a preview function)echo "a{br}b" python BeautifulSoup.py
Posted by Sam Ruby at Thu Jun 01 2006 09:22
Very odd. Didn't expect the vertical bar to be eaten.
Posted by Leonard at Thu Jun 01 2006 09:25
Kick away.The Beautiful Soup default is supposedly to act as an HTML pretty-printer, but it's actually acting like an XML pretty-printer. I don't know why I did that. Change the second-to-last line from BeautifulStoneSoup to BeautifulSoup, and it will start knowing about HTML's self-closing tags. I'll make this change in the next version.
Posted by Sam Ruby at Fri Jun 02 2006 07:29
Here's a test case for you. If you can get it to pass, I'll look at replacing sanitize with BeautifulSoup in Planet.import BeautifulSoup, urllib
from xml.dom import minidompage = urllib.urlopen('http://www.whump.com/moreLikeThis/').read()
minidom.parseString(str(BeautifulSoup.BeautifulSoup(page)))Posted by Leonard at Fri Jun 02 2006 11:17
Got it working. I'll email you with details once I get the new release out.
