Question

import re
from bs4 import BeautifulSoup

>>>html_text = '<li>Location:<a href="tweetLocation">tweetLocation</a></li>'
>>>soup = BeautifulSoup(html_text)
>>>print soup.find('li', text=re.compile(r'^Location.*'))

我没有得到答案。谁能告诉我如何找到它？

Answer 1

text参数（现在重命名为string）实际检查元素的.string以匹配所需的条件 - 在本例中为正则表达式^Location.*。

现在，.string属性有一些特别之处 - 如果代码有多个孩子，它的值为None ：

如果标签包含多个内容，则不清楚是什么 .string应该引用，所以.string被定义为None

而且，您的li元素实际上有多个子元素 - 文本节点Location:和a元素。因此，没有结果。

相反，找到文本元素，然后转到所需的元素：

In [1]: import re In [2]: from bs4 import BeautifulSoup In [3]: html_text = '<li>Location:<a href="tweetLocation">tweetLocation</a></li>' In [4]: soup = BeautifulSoup(html_text, "html.parser") In [5]: soup.find(text=re.compile(r'^Location.*')).find_parent('li') Out[5]: <li>Location:<a href="tweetLocation">tweetLocation</a></li> In [6]: soup.find(text=re.compile(r'^Location.*')).next_sibling.get_text() Out[6]: 'tweetLocation'

BeautifulSoup找不到标签李

1 个答案: