Question

给出一个示例html字符串

<table>
<tr>
<td class="td" height="25">Upstream Power</td>
<td class="td">25.2 dBmV</td>
<td class="td">49.2 dBmV</td>
</tr>
</table>

我可以使用以下方式查找文字：

soup.find_all(text=re.compile("Power"))

但要找到整个标签找不到任何东西。我错过了什么？

soup.find_all("td",text=re.compile("Power"))

Answer 1

在BS3中，方法为findAll而非find_all：

>>> markup = '''<table>
... <tr>
... <td class="td" height="25">Upstream Power</td>
... <td class="td">25.2 dBmV</td>
... <td class="td">49.2 dBmV</td>
... </tr>
... </table>'''
>>> from BeautifulSoup import BeautifulSoup as bs
>>> soup = bs(markup)
>>> import re
>>> soup.findAll(text=re.compile('Power'))
... [u'Upstream Power']

编辑：我看到方法was renamed in BS4。它似乎对我有用：

>>> from bs4 import BeautifulSoup as bs
>>> soup = bs(markup)
>>> soup.find_all(text=re.compile('Power'))
... [u'Upstream Power']

Edit2：为了更轻松地导航解析树，您可以use tag names：

>>> soup.td.find_all(text=re.compile('Power'))
... [u'Upstream Power']

无法在BeautifulSoup中找到标记和正则表达式

1 个答案: