Question

我正在试图抓一个维基百科页面。并面临一个简单的问题，无法找到解决方案。 th和td有2个标签彼此相邻。而且，它们都是独立的。我想根据另一个标签的值（彼此独立）获取1标签的文本。

以下是一个例子：

<th scope="row" style="white-space:nowrap;padding-right:0.65em;">Budget</th>
<td style="line-height:1.3em;">$200 million<sup id="cite_ref-3" class="reference"><a href="#cite_note-3">[3]</a></sup></td>

我希望得到＆＃39; td＆＃39;标签（2亿美元），如果＆＃39;＆＃39;标签文字是＆＃39;预算＆＃39;。请记住，唯一的通信是紧挨着彼此＆＃39;。

Answer 1

from bs4 import BeautifulSoup

html = '''<th scope="row" style="white-space:nowrap;padding-right:0.65em;">Budget</th>
<td style="line-height:1.3em;">$200 million<sup id="cite_ref-3" class="reference"><a href="#cite_note-3">[3]</a></sup></td>'''

soup = BeautifulSoup(html, 'lxml')
td_text = soup.find(lambda tag: tag.name=='td' and 'Budget' in tag.parent.text).text
print(td_text)

出：

$200 million[3]

在python报废中找到彼此相邻的独立标签

1 个答案: