<tr>
<td nowrap> good1 </td>
<td class = "td_left" nowrap=""> 1 </td>
</tr>
<tr0>
<td nowrap> good2 </td>
<td class = "td_left" nowrap=""> </td>
</tr0>
如何使用python解析呢?请帮忙。 我希望得到结果为列表['good1',1,'good2',无]
答案 0 :(得分:0)
查找所有tr
代码并从中获取所有td
:
from bs4 import BeautifulSoup
page = """<tr>
<td nowrap> good1 </td>
<td nowrap class = "td_left"> 1 </td>
</tr>
<tr>
<td nowrap> good2 </td>
<td nowrap class = "td_left"> 2 </td>
</tr>"""
soup = BeautifulSoup(page)
rows = soup.body.find_all('tr')
print [td.text.strip() for row in rows for td in row.find_all('td')]
打印:
[u'good1', u'1', u'good2', u'2']
注意,strip()有助于摆脱前导和尾随空格。
希望有所帮助。