我使用漂亮的汤刮数据来创建一个数据框。但是,有两个问题。
将urllib.request导入为req
from bs4 import BeautifulSoup
import bs4
import requests
import pandas as pd
url = "https://finance.yahoo.com/quote/BF-B/profile?p=BF-B"
root = requests.get(url)
soup = BeautifulSoup(root.text, 'html.parser')
records = []
for result in soup:
name = soup.find_all('h1', attrs={'D(ib) Fz(18px)'})
website = soup.find_all('a')[44]
sector = soup.find_all('span')[35]
industry = soup.find_all('span')[37]
records.append((name, website, sector, industry))
df = pd.DataFrame(records, columns=['name', 'website', 'sector', 'industry'])
df.head()
结果如下:
答案 0 :(得分:0)
要获取有关公司的信息,您不必遍历soup
,只需直接提取必要的信息即可。要摆脱[..]
括号,请使用.text
属性:
import requests
from bs4 import BeautifulSoup
url = 'https://finance.yahoo.com/quote/BF-B/profile?p=BF-B'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = []
all_data.append({
'Name': soup.h1.text,
'Website': soup.select_one('.asset-profile-container a[href^="http"]')['href'],
'Sector': soup.select_one('span:contains("Sector(s)") + span').text,
'Industry': soup.select_one('span:contains("Industry") + span').text
})
df = pd.DataFrame(all_data)
print(df)
打印:
Name Website Sector Industry
0 Brown-Forman Corporation (BF-B) http://www.brown-forman.com Consumer Defensive Beverages—Wineries & Distilleries