我试图使用BeautifulSoup从Y!Finance网站提取数据并将所有内容存储在列表中。在列表中,缺少可扩展行的标题(总收入,运营费用),但数字仍然存在。有没有办法在输出中包含标题?
import pandas as pd
from bs4 import BeautifulSoup
import urllib.request as ur
url = 'https://finance.yahoo.com/quote/AAPL/financials?p=AAPL'
read_data = ur.urlopen(url).read()
soup= BeautifulSoup(read_data,'lxml')
ls= [] # Create empty list
for l in soup.find_all('div'):
ls.append(l.string)
new_ls = list(filter(None,ls))
当前输出:
'Expand All',
'ttm',
'9/30/2019',
'9/30/2018',
'9/30/2017',
'9/30/2016',
'273,857,000',
'260,174,000',
'265,595,000',
'229,234,000',
'215,639,000',
预期输出:
'Expand All',
'ttm',
'9/30/2019',
'9/30/2018',
'9/30/2017',
'9/30/2016',
'Total Revenue',
'273,857,000',
'260,174,000',
'265,595,000',
'229,234,000',
'215,639,000',
更新:如果我从“ span”中提取,则输出中缺少0的数字,这在以后构造数据框时会产生另一个问题
for l in soup.select('div.D\(tbr\)'):
for n in l.select('span'):
print(n.text)
答案 0 :(得分:2)
我知道这有点题外话,但看起来您只想要Yahoo Finance的数据正确吗?如果是这样,他们已经有了一个python软件包,使用它可能会比随后的Web抓取更容易。
https://pypi.org/project/yahoo-finance/
您可以输入共享
import numpy
import cv2
b = numpy.zeros([5,5,3], dtype=numpy.uint8)
b[:,:,0] = numpy.ones([5,5])*64
b[:,:,1] = numpy.ones([5,5])*128
b[:,:,2] = numpy.ones([5,5])*192
还可以通过使用以下命令来获取大量数据
apple = Share('AAPL')
答案 1 :(得分:0)
以下内容将为您提供所有数据,然后您可以过滤掉不需要的内容:
for row in soup.select('div[data-test="fin-row"]'):
for r in row:
for l in r:
print(l.text)
print('-------\n')
输出:
Total Revenue
273,857,000
260,174,000
265,595,000
-
215,639,000
-------
Cost of Revenue
169,277,000
161,782,000
163,756,000
-
131,376,000
-------
Gross Profit
等
如果您还想以编程方式获取标题,请尝试:
head_ind = [55,58,60,62,64,66]
for i in head_ind:
heads = f'span[data-reactid="{i}"]:not([class])'
for head in soup.select(heads):
print(head.text)
输出:
Breakdown
ttm
9/30/2019
9/30/2018
9/30/2017
9/30/2016