如何从雅虎财务中提取表格?

时间:2020-08-13 17:30:40

标签: python pandas dataframe web-scraping python-requests

我正在尝试从Yahoo Finance中提取财务表。我正在使用thimport熊猫作为pd

import requests
from bs4 import BeautifulSoup
url="https://finance.yahoo.com/quote/FB/financials?p=FB"
headers={"User-Agent":"Mozilla/5.0"}
r=requests.get(url,headers=headers)
soup=BeautifulSoup(r.content, "html.parser")
stattable=soup.findAll('div', class_="M(0) Whs(n) BdEnd Bdc($seperatorColor) D(itb)")
stattable=stattable[0]
breakdown=[]

for row in stattable.findAll("div"):
  for cell in row.findAll(class_="D(ib) Va(m) Ell Mt(-3px) W(215px)--mv2 W(200px) undefined"):
    breakdown.append(cell.text)

正在提取的数据不正确,并且正在复制自身。 这是数据的一小部分:

'Breakdown', 'ttm', '12/31/2019', '12/31/2018', '12/31/2017', '12/31/20
16', 'Breakdown', 'ttm', '12/31/2019', '12/31/2018', '12/31/2017', '12/3
1/2016', 'Breakdown', 'ttm', '12/31/2019', '12/31/2018', '12/31/2017', '
12/31/2016', 'Total Revenue75,157,00070,697,00055,838,00040,653,000-', '
', '75,157,000', '70,697,000', '55,838,000', '40,653,000', '', 'Cost of 
Revenue13,935,00012,770,0009,355,0005,454,000-', '13,935,000', '12,770,0
00', '9,355,000', '5,454,000', '', 'Gross Profit61,222,00057,927,00046,4
83,00035,199,000-', '61,222,000', '57,927,000', '46,483,000', '35,199,00
0', '', 'Operating Expense33,323,00033,941,00021,570,00014,996,000-', ''
, '33,323,000', '33,941,000', '21,570,000', '14,996,000', '', 'Operating
 Income27,899,00023,986,00024,913,00020,203,000-', '27,899,000', '23,986
,000', '24,913,000', '20,203,000', '', 'Net Non Operating Interest Incom
e Expense877,000904,000652,000392,000-', '', '877,000', '904,000', '652,
000', '392,000', '', 'Other Income Expense-286,000-78,000-204,000-1,000-
', '', '-286,000', '-78,000', '-204,000', '-1,000', '', 'Pretax Income28
,490,00024,812,00025,361,00020,594,000-', '28,490,000', '24,812,000', '2
5,361,000', '20,594,000', '', 'Tax Provision4,969,0006,327,0003,249,0004
,660,000-', '4,969,000', '6,327,000', '3,249,000', '4,660,000', '', 'Net Income Common Stockholders23,521,00018,485,00022,111,00015,920,000-', '', '23,521,000', '18,485,000', '22,111,000', '15,920,000', '', 'Average Dilution Earnings-01,00014,000-', '0', '1,000', '14,000', ''

我的目标是将其提取到熊猫数据框中。有人可以帮我吗,谢谢。

1 个答案:

答案 0 :(得分:0)

这是利用yahooquery的解决方案:

from yahooquery import Ticker

fb = Ticker('fb')
fb.income_statement()

披露:我是yahooquery的作者

相关问题