Question

我正在从Wikipedia信息框表中删除公司数据，我需要在这些数据中抓取td内部的一些值，例如Type，Traded as，Services等。我的代码是

    response = requests.get(url,headers=headers)
    html_soup = BeautifulSoup(response.text, 'lxml')
    table_container = html_soup.find('table', class_='infobox')
    hq_name=table_container.find("th", text=['Headquarters']).find_next_sibling("td")

它给出了总部的价值并且完美地运行了

但是当我要获取“交易方式”或任何超链接时，以上代码无效，

那么如何获得贸易形式或类型的下一个兄弟。

Answer 1

根据您的评论：

https://en.wikipedia.org/wiki/IBM这是URL，也是预期的输出将为纽约证券交易所：IBM DJIA组件S＆P 100组件标普500成份股

使用a标签来分隔并通过nth-of-type从表中选择所需的行。然后，您可以根据需要将输出列表中的前两个项目合并在一起

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://en.wikipedia.org/wiki/IBM')
soup = bs(r.content, 'lxml')

items = [item.text.replace('\xa0',' ') for item in soup.select('.vcard tr:nth-of-type(4) a')]
print(items)

要显示（如果确实是第一和第二个连接在一起？）：

final = items[2:]
final.insert(0, '-'.join([items[0] , items[1]]))
final

如何找到表标签中的“ a”标签的下一个同级

1 个答案: