使用Python提取损益表数据

时间:2016-08-18 07:09:16

标签: python yahoo-finance

我正在尝试使用以下代码从Yahoo Finance中提取数据,但它给我带来了错误。

我尝试从我的结束调试,看起来使用find命令时标题是空白的

title = soup.find(" strong",text = pattern)#returning blank

代码:

# -*- coding: utf-8 -*-
"""

"""
#Read income statement to calculate ratios

from bs4 import BeautifulSoup
import requests
import re,sys

myurl = "https://finance.yahoo.com/q/is?s=AAPL&annual"
html = requests.get(myurl).content
soup = BeautifulSoup(html)

def periodic_figure_values(soup, yahoo_figure):

    values = []
    pattern = re.compile(yahoo_figure)

    title = soup.find("strong", text=pattern)    # works for the figures printed in bold
    if title:
        row = title.parent.parent
    else:
        title = soup.find("td", text=pattern)    # works for any other available figure
        if title:
            row = title.parent
        else:
            sys.exit("Invalid figure '" + yahoo_figure + "' passed.")

    cells = row.find_all("td")[1:]    # exclude the <td> with figure name
    for cell in cells:
        if cell.text.strip() != yahoo_figure:    # needed because some figures are indented
            str_value = cell.text.strip().replace(",", "").replace("(", "-").replace(")", "")
            if str_value == "-":
                str_value = 0
            value = int(str_value) * 1000
            values.append(value)

    return values


def financials_soup(ticker_symbol, statement="is", quarterly=False):

    if statement == "is" or statement == "bs" or statement == "cf":
        url = "https://finance.yahoo.com/q/" + statement + "?s=" + ticker_symbol
        if not quarterly:
            url += "&annual"
        return BeautifulSoup(requests.get(url).text, "html.parser")

    return sys.exit("Invalid financial statement code '" + statement + "' passed.")

print(periodic_figure_values(financials_soup("AAPL", "is"), "Income Tax Expense")) 

"""throws error: An exception has occurred, use %tb to see the full traceback.

SystemExit: Invalid figure 'Income Tax Expense' passed."""

1 个答案:

答案 0 :(得分:0)

据我所知(我只是盯着,完成菜鸟),雅虎财务使用Java Script。因此,美丽的汤不会起作用。至少这是我被告知的。对于像这样的网站,Selenium应该是正确的工具。

下面是链接,在那里我问了类似的问题,在那个链接中是有效的代码。好吧差不多。我不得不改变2-3件小事,琐事:

How should I properly use Selenium

以下是我的评论,回复了我改变后的工作方式:(不确定代码是否稍后更改以反映这一点,看看它是否适合您而不更改它,如果没有实现我所做的更改:

我不得不稍微更改代码,等待未使用,所以我搜索了互联网,然后我改为等到WebDriverWait,它仍然没有点击资产负债表,直到我改为:balanceSheet = WebDriverWait(浏览器) ,5)现在它正在工作,我已经学到了很多东西,现在我可以开始研究我的项目并学习在这个过程中编写代码。非常感谢! - Al_ Aug 3 at 13:07

编辑,我的示例只从资产负债表中下载了一件事,但这应该可以帮助您弄清楚如何为您的案例(财务报表)做到这一点。如果您无法获取代码,我会在我回到该计算机后向您发送我的工作代码。