Question

所以我试图通过从维基百科页面抓取来获取某些位置的区域。以Cumbria为例（https://en.wikipedia.org/wiki/Cumbria）我可以通过;

获取信息框

url = 'https://en.wikipedia.org/wiki/Cumbria'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
value = soup.find('table', {"class": "infobox geography vcard"}) \
            .find('tr', {"class":"mergedrow"}).text

但是infobox geography vcard有多个<tr class='mergerow'>子集，每个子集中都有一个<th scope='row'>。

我想要的<th scope='row'>是<th scope="row">Area</th>，我想知道我是否可以通过搜索“区域”代替标签来获取<th scope="row">Area</th>子集中的文本在infobox geography vcard

下无处不在

Answer 1

您可以直接使用th搜索所有scope=row。然后迭代它们，看看哪些文件有Area，并使用find_next_sibling获取下一个兄弟（将td包含您需要的数据）。

请注意，此表格包含2个Area个条目，其中一个用于＆＃39;礼仪县＆＃39;无论是什么意思，都可以为非大都市县提供一个;）。

ths = soup.find_all('th', {'scope': 'row'})

for th in ths:
    if th.text == 'Area':
        area = th.find_next_sibling().text
        print(area)

#  6,768 km2 (2,613 sq mi)
#  6,768 km2 (2,613 sq mi)

通过文本找不到漂亮汤中的项目

1 个答案: