Question

我正在尝试使用以下代码从Yahoo Finance中提取数据，但它给我带来了错误。

我尝试从我的结束调试，看起来使用find命令时标题是空白的

title = soup.find（＆＃34; strong＆＃34;，text = pattern）#returning blank

代码：

# -*- coding: utf-8 -*-
"""

"""
#Read income statement to calculate ratios

from bs4 import BeautifulSoup
import requests
import re,sys

myurl = "https://finance.yahoo.com/q/is?s=AAPL&annual"
html = requests.get(myurl).content
soup = BeautifulSoup(html)

def periodic_figure_values(soup, yahoo_figure):

    values = []
    pattern = re.compile(yahoo_figure)

    title = soup.find("strong", text=pattern)    # works for the figures printed in bold
    if title:
        row = title.parent.parent
    else:
        title = soup.find("td", text=pattern)    # works for any other available figure
        if title:
            row = title.parent
        else:
            sys.exit("Invalid figure '" + yahoo_figure + "' passed.")

    cells = row.find_all("td")[1:]    # exclude the <td> with figure name
    for cell in cells:
        if cell.text.strip() != yahoo_figure:    # needed because some figures are indented
            str_value = cell.text.strip().replace(",", "").replace("(", "-").replace(")", "")
            if str_value == "-":
                str_value = 0
            value = int(str_value) * 1000
            values.append(value)

    return values


def financials_soup(ticker_symbol, statement="is", quarterly=False):

    if statement == "is" or statement == "bs" or statement == "cf":
        url = "https://finance.yahoo.com/q/" + statement + "?s=" + ticker_symbol
        if not quarterly:
            url += "&annual"
        return BeautifulSoup(requests.get(url).text, "html.parser")

    return sys.exit("Invalid financial statement code '" + statement + "' passed.")

print(periodic_figure_values(financials_soup("AAPL", "is"), "Income Tax Expense")) 

"""throws error: An exception has occurred, use %tb to see the full traceback.

SystemExit: Invalid figure 'Income Tax Expense' passed."""

Answer 1

据我所知（我只是盯着，完成菜鸟），雅虎财务使用Java Script。因此，美丽的汤不会起作用。至少这是我被告知的。对于像这样的网站，Selenium应该是正确的工具。

下面是链接，在那里我问了类似的问题，在那个链接中是有效的代码。好吧差不多。我不得不改变2-3件小事，琐事：

How should I properly use Selenium

以下是我的评论，回复了我改变后的工作方式:(不确定代码是否稍后更改以反映这一点，看看它是否适合您而不更改它，如果没有实现我所做的更改：

我不得不稍微更改代码，等待未使用，所以我搜索了互联网，然后我改为等到WebDriverWait，它仍然没有点击资产负债表，直到我改为：balanceSheet = WebDriverWait（浏览器），5）现在它正在工作，我已经学到了很多东西，现在我可以开始研究我的项目并学习在这个过程中编写代码。非常感谢！ - Al_ Aug 3 at 13:07

编辑，我的示例只从资产负债表中下载了一件事，但这应该可以帮助您弄清楚如何为您的案例（财务报表）做到这一点。如果您无法获取代码，我会在我回到该计算机后向您发送我的工作代码。

使用Python提取损益表数据

1 个答案: