BeautifulSoup嵌套/嵌入相同的标签与不同的类

时间:2012-09-07 20:40:38

标签: nested beautifulsoup

我正在尝试解析一些经常有这样片段的html文件:

<p class="p5"><p class="s2">In the directory </p>/home/blah/<p class="s2"> there is a file, </p>plotData.dat</p>

它被解析为:

>>> [c for c in P.body.children]
[<p class="p5"></p>,
 <p class="s2">In the directory </p>,
 u'/home/blah/',
 <p class="s2"> there is a file, </p>,
u'plotData.dat']

我预计它会以

出现
>>> [c for c in P.body.children]
[<p class="p5"></p>,
 <p class="s2">In the directory </p>,
 <p class="p5">u'/home/blah/'</p>,
 <p class="s2"> there is a file, </p>,
 <p class="p5">u'plotData.dat'</a>]

输入html是不是格格不入?有什么我可以做的输入html解析为后者? (我无法控制html的外观)

编辑:完成MWE:

>>> from bs4 import BeautifulSoup as BS
>>> P = BS('<p class="p5"><p class="s2">In the directory </p>/home/blah/<p class="s2"> there is a file, </p>plotData.dat</p>')
>>> [c for c in P.body.children]
[<p class="p5"></p>, <p class="s2">In the directory </p>, u'/home/blah/', <p class="s2"> there is a file, </p>, u'plotData.dat']

0 个答案:

没有答案