合并财务数据

时间:2019-05-19 22:16:46

标签: python pandas

我试图弄清楚如何从Yahoo财务中获取财务信息(收益表,资产负债表和现金流量。我有一个称为符号的列表,其中包含所有股票代码(请参见下面的代码)。最终我想要最终得到具有连续4年(2018、2017、2016、2015)的行的csv。 ''' 我可以手动执行此操作,但是我要执行的操作是使其自动化,以便返回具有所有相关信息(77列和4 *#ticker符号行)的.csv文件。 enter image description here 将上面的图像转到: enter image description here

我想出了如何使用刮板从yahoo刮板数据。

from lxml import html
from lxml import html
import requests

import numpy as np

import pandas as pd
def scrape_table(url):
    page = requests.get(url)
    tree = html.fromstring(page.content)
    table = tree.xpath('//table')
    assert len(table) == 1

    df = pd.read_html(lxml.etree.tostring(table[0], method='html'))[0]

    df = df.set_index(0)
    df = df.dropna()
    df = df.transpose()
    df = df.replace('-', '0')

    # The first column should be a date
    df[df.columns[0]] = pd.to_datetime(df[df.columns[0]])
    cols = list(df.columns)
    cols[0] = 'Date'
    df = df.set_axis(cols, axis='columns', inplace=False)

    numeric_columns = list(df.columns)[1::]
    df[numeric_columns] = df[numeric_columns].astype(np.float64)

    return df



def merge_IS_BS_CF(df_IS, df_BS, df_CF):
    #merge the three financial statements - Income statement, balance sheet, cash flow into one dataframe
    #return the dataframe
    df_merge_IS_BS = pd.merge(df_IS, df_BS, on='Date')
    df_merge_IS_BS_CF = pd.merge(df_merge_IS_BS, df_CF, on='Date')
    return df_merge_IS_BS_CF

symbols = ['AAPL', 'MFT.NZ']

financials = {}
#create a dictionary of ticker names and their respective statements' urls
for symbol in symbols:
    financials[symbol] = ['https://finance.yahoo.com/quote/' + symbol + '/financials?p=' + symbol, 'https://finance.yahoo.com/quote/' + symbol + '/balance-sheet?p=' + symbol, 'https://finance.yahoo.com/quote/' + symbol + '/cash-flow?p=' + symbol]
print (financials['AAPL'][0])
data = pd.DataFrame([])

我得到的结果是它没有将下一个行情收录器数据连接到熊猫数据框中。 感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

对不起,我自己弄清楚了。只为下一个人,我的错误是没有意识到我必须保存附加的数据框。

symbols = ['AAPL', 'MFT.NZ']
financials = {}
#create a dictionary of ticker names and their respective statements' urls
for symbol in symbols:
    financials[symbol] = ['https://finance.yahoo.com/quote/' + symbol + '/financials?p=' + symbol, 'https://finance.yahoo.com/quote/' + symbol + '/balance-sheet?p=' + symbol, 'https://finance.yahoo.com/quote/' + symbol + '/cash-flow?p=' + symbol]
print (financials['AAPL'][0])
data = pd.DataFrame()

for f in financials:
    print (f)
    df_income_statement = scrape_table(financials[f][0])
    df_balance_sheet = scrape_table(financials[f][1])
    df_cash_flow = scrape_table(financials[f][2])
    oldmerge = merge_IS_BS_CF(df_income_statement, df_balance_sheet, df_cash_flow)
    #print (oldmerge)
    data = data.append(oldmerge)