在这个'td'部分有很多没有任何名称的div部分,我想要特定div部分的数据,如何做到这一点我尝试使用下面的代码,但它给出了很多输出。
import requests
from bs4 import BeautifulSoup
url = "https://www.bloomberg.com/research/stocks/private/person.asp?personId=45794107&privcapId=8032555&previousCapId=12437591&previousTitle=Pawan%20Hans%20Limited"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
data = []
for table in soup.findAll('table'):
for row in table.findAll('tr'):
for col in row.findAll('td'):
#print(col.findAll('div'))
data.append(col.get_text())
print(data)
我想要以下输出:
2017-Present
Independent Director
Air India Limited
答案 0 :(得分:0)
import requests
from bs4 import BeautifulSoup
url = "https://www.bloomberg.com/research/stocks/private/person.asp?personId=45794107&privcapId=8032555&previousCapId=12437591&previousTitle=Pawan%20Hans%20Limited"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
data = []
table = soup.find_all('table', cellpadding="0")[2]
divs = table.find_all('div')[1:4]
for div in divs:
print div.get_text()
答案 1 :(得分:0)
或者你可以在不使用硬编码索引的情况下实现相同目的:
import requests
from bs4 import BeautifulSoup
url = "https://www.bloomberg.com/research/stocks/private/person.asp?personId=45794107&privcapId=8032555&previousCapId=12437591&previousTitle=Pawan%20Hans%20Limited"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
for items in soup.find_all(class_="sectionTitle"):
if "Board Members" in items.text:
item = items.find_next_sibling()
presence = items.find_next_sibling().text
position = item.find_next("div")
company = item.find_next("a")
print("{}\n{}\n{}".format(presence,position.text,company.text))
输出:
2017-Present
Independent Director
Air India Limited