Question

我尝试使用BeautfulSoup从HTML中获取文本的位置，如下所示，这是我的html：

<p><em>code of Drink<br></em>
Budweiser: 4BDB1CD96<br>
price: 10$</p>

带有代码：

soup = BeautifulSoup(html,'lxml')
result = re.escape('4BDB1CD96')
tag = soup.find(['li','div','p','em'],string=re.compile(result))

我无法提取标签，但是将find_all（）更改为：标签= soup.find（string = re.compile（result））然后我可以得到结果：百威啤酒：4BDB1CD96 所以我想知道为什么以及如何从atat

获得标签中的结果

Answer 1

这里的问题是您的标签具有嵌套标签，而您要搜索的文本位于此类标签内（此处为p）。

因此，最简单的方法是在.find()内使用lambda来检查标签名称以及.text属性是否包含您的模式。在这里，您甚至不需要正则表达式：

>>> tag = soup.find(lambda t: t.name in ['li','div','p','em'] and '4BDB1CD96' in t.text)
>>> tag
<p><em>code of Drink<br/></em>
Budweiser: 4BDB1CD96<br/>
price: 10$</p>
>>> tag.string
>>> tag.text
'code of Drink\nBudweiser: 4BDB1CD96\nprice: 10$'

当然，您可以使用正则表达式进行更复杂的搜索：

r = re.compile('4BDB1CD96') # or whatever the pattern is
tag = soup.find(lambda t: t.name in ['li','div','p','em'] and r.search(t.text))

BeautifulSoup无法使用find_all（）提取项目

1 个答案: