< Gnath
Next >

[Comments] (4) Beautiful Soup Trickery: Here's a nice bit of trickery I discovered, based on code sent me by Staffan Malmgren. A common task among the target audience for Beautiful Soup is to strip HTML tags from a document. It turns out this is a one-liner:

''.join([e for e in soup.recursiveChildGenerator() if isinstance(e,unicode)])

Where soup is your soup object. I really like the if conditional in list comprehensions; I don't think Ruby has anything like it. I can think of ways to simulate it but they kill its simplicity, which is what I like about it.

Filed under:

Comments:

Posted by anonymous at Mon May 29 2006 21:15

''.join(e for e in soup.recursiveChildGenerator() if isinstance(e,unicode))

nicer?

Posted by Leonard at Mon May 29 2006 22:16

Is that valid Python? It doesn't work for me.

Posted by anonymous at Mon May 29 2006 23:14

You are right it is >= Python 2.4
http://docs.python.org/whatsnew/node4.html

Posted by Leonard at Mon May 29 2006 23:16

Coooooool, as Manoj would say.


[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.