刮掉雅虎财务财务比率

时间:2016-10-19 19:02:16

标签: python beautifulsoup

我一直试图使用Beautiful Soup从Yahoo Finance中删除当前比率(如下所示)的值,但它会一直返回空值。

enter image description here

有趣的是,当我查看URL的页面来源时,当前比率的值未列在那里。

到目前为止我的代码是:

import urllib
from bs4 import BeautifulSoup

url = ("http://finance.yahoo.com/quote/GSB/key-statistics?p=GSB")
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")
script = soup.find("td", {"class": "Fz(s) Fw(500) Ta(end)",
                          "data-reactid": ".1ujetg16lcg.0.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.2.0.0.0.1.0.1:$FINANCIAL_HIGHLIGHTS.$BALANCE_SHEET.1.0.$CURRENT_RATIO.1"
                         })

有谁知道如何解决这个问题?

4 个答案:

答案 0 :(得分:7)

你实际上可以获得数据是json格式,有一个api调用返回很多数据,包括当前比率:

enter image description here

import requests

params = {"formatted": "true",
        "crumb": "AKV/cl0TOgz", # works without so not sure of significance
        "lang": "en-US",
        "region": "US",
        "modules": "defaultKeyStatistics,financialData,calendarEvents",
        "corsDomain": "finance.yahoo.com"}

r = requests.get("https://query1.finance.yahoo.com/v10/finance/quoteSummary/GSB", params=params)
data = r.json()[u'quoteSummary']["result"][0]

这会给你一个包含大量数据的词典:

from pprint import pprint as pp
pp(data)
{u'calendarEvents': {u'dividendDate': {u'fmt': u'2016-09-08',
                                        u'raw': 1473292800},
                      u'earnings': {u'earningsAverage': {},
                                    u'earningsDate': [{u'fmt': u'2016-10-27',
                                                       u'raw': 1477526400}],
                                    u'earningsHigh': {},
                                    u'earningsLow': {},
                                    u'revenueAverage': {u'fmt': u'8.72M',
                                                        u'longFmt': u'8,720,000',
                                                        u'raw': 8720000},
                                    u'revenueHigh': {u'fmt': u'8.72M',
                                                     u'longFmt': u'8,720,000',
                                                     u'raw': 8720000},
                                    u'revenueLow': {u'fmt': u'8.72M',
                                                    u'longFmt': u'8,720,000',
                                                    u'raw': 8720000}},
                      u'exDividendDate': {u'fmt': u'2016-05-19',
                                          u'raw': 1463616000},
                      u'maxAge': 1},
  u'defaultKeyStatistics': {u'52WeekChange': {u'fmt': u'3.35%',
                                              u'raw': 0.033536673},
                            u'SandP52WeekChange': {u'fmt': u'5.21%',
                                                   u'raw': 0.052093267},
                            u'annualHoldingsTurnover': {},
                            u'annualReportExpenseRatio': {},
                            u'beta': {u'fmt': u'0.23', u'raw': 0.234153},
                            u'beta3Year': {},
                            u'bookValue': {u'fmt': u'1.29', u'raw': 1.295},
                            u'category': None,
                            u'earningsQuarterlyGrowth': {u'fmt': u'-28.00%',
                                                         u'raw': -0.28},
                            u'enterpriseToEbitda': {u'fmt': u'9.22',
                                                    u'raw': 9.215},
                            u'enterpriseToRevenue': {u'fmt': u'1.60',
                                                     u'raw': 1.596},
                            u'enterpriseValue': {u'fmt': u'50.69M',
                                                 u'longFmt': u'50,690,408',
                                                 u'raw': 50690408},
                            u'fiveYearAverageReturn': {},
                            u'floatShares': {u'fmt': u'11.63M',
                                             u'longFmt': u'11,628,487',
                                             u'raw': 11628487},
                            u'forwardEps': {u'fmt': u'0.29', u'raw': 0.29},
                            u'forwardPE': {},
                            u'fundFamily': None,
                            u'fundInceptionDate': {},
                            u'heldPercentInsiders': {u'fmt': u'36.12%',
                                                     u'raw': 0.36116},
                            u'heldPercentInstitutions': {u'fmt': u'21.70%',
                                                         u'raw': 0.21700001},
                            u'lastCapGain': {},
                            u'lastDividendValue': {},
                            u'lastFiscalYearEnd': {u'fmt': u'2015-12-31',
                                                   u'raw': 1451520000},
                            u'lastSplitDate': {},
                            u'lastSplitFactor': None,
                            u'legalType': None,
                            u'maxAge': 1,
                            u'morningStarOverallRating': {},
                            u'morningStarRiskRating': {},
                            u'mostRecentQuarter': {u'fmt': u'2016-06-30',
                                                   u'raw': 1467244800},
                            u'netIncomeToCommon': {u'fmt': u'3.82M',
                                                   u'longFmt': u'3,819,000',
                                                   u'raw': 3819000},
                            u'nextFiscalYearEnd': {u'fmt': u'2017-12-31',
                                                   u'raw': 1514678400},
                            u'pegRatio': {},
                            u'priceToBook': {u'fmt': u'2.64',
                                             u'raw': 2.6358302},
                            u'priceToSalesTrailing12Months': {},
                            u'profitMargins': {u'fmt': u'12.02%',
                                               u'raw': 0.12023},
                            u'revenueQuarterlyGrowth': {},
                            u'sharesOutstanding': {u'fmt': u'21.18M',
                                                   u'longFmt': u'21,184,300',
                                                   u'raw': 21184300},
                            u'sharesShort': {u'fmt': u'27.06k',
                                             u'longFmt': u'27,057',
                                             u'raw': 27057},
                            u'sharesShortPriorMonth': {u'fmt': u'36.35k',
                                                       u'longFmt': u'36,352',
                                                       u'raw': 36352},
                            u'shortPercentOfFloat': {u'fmt': u'0.20%',
                                                     u'raw': 0.001977},
                            u'shortRatio': {u'fmt': u'0.81', u'raw': 0.81},
                            u'threeYearAverageReturn': {},
                            u'totalAssets': {},
                            u'trailingEps': {u'fmt': u'0.18', u'raw': 0.18},
                            u'yield': {},
                            u'ytdReturn': {}},
  u'financialData': {u'currentPrice': {u'fmt': u'3.41', u'raw': 3.4134},
                     u'currentRatio': {u'fmt': u'1.97', u'raw': 1.974},
                     u'debtToEquity': {},
                     u'earningsGrowth': {u'fmt': u'-33.30%', u'raw': -0.333},
                     u'ebitda': {u'fmt': u'5.5M',
                                 u'longFmt': u'5,501,000',
                                 u'raw': 5501000},
                     u'ebitdaMargins': {u'fmt': u'17.32%',
                                        u'raw': 0.17318001},
                     u'freeCashflow': {u'fmt': u'4.06M',
                                       u'longFmt': u'4,062,250',
                                       u'raw': 4062250},
                     u'grossMargins': {u'fmt': u'79.29%', u'raw': 0.79288},
                     u'grossProfits': {u'fmt': u'25.17M',
                                       u'longFmt': u'25,172,000',
                                       u'raw': 25172000},
                     u'maxAge': 86400,
                     u'numberOfAnalystOpinions': {},
                     u'operatingCashflow': {u'fmt': u'6.85M',
                                            u'longFmt': u'6,853,000',
                                            u'raw': 6853000},
                     u'operatingMargins': {u'fmt': u'16.47%',
                                           u'raw': 0.16465001},
                     u'profitMargins': {u'fmt': u'12.02%', u'raw': 0.12023},
                     u'quickRatio': {u'fmt': u'1.92', u'raw': 1.917},
                     u'recommendationKey': u'strong_buy',
                     u'recommendationMean': {u'fmt': u'1.00', u'raw': 1.0},
                     u'returnOnAssets': {u'fmt': u'7.79%', u'raw': 0.07793},
                     u'returnOnEquity': {u'fmt': u'15.05%', u'raw': 0.15054},
                     u'revenueGrowth': {u'fmt': u'5.00%', u'raw': 0.05},
                     u'revenuePerShare': {u'fmt': u'1.51', u'raw': 1.513},
                     u'targetHighPrice': {},
                     u'targetLowPrice': {},
                     u'targetMeanPrice': {},
                     u'targetMedianPrice': {},
                     u'totalCash': {u'fmt': u'20.28M',
                                    u'longFmt': u'20,277,000',
                                    u'raw': 20277000},
                     u'totalCashPerShare': {u'fmt': u'0.96', u'raw': 0.957},
                     u'totalDebt': {u'fmt': None,
                                    u'longFmt': u'0',
                                    u'raw': 0},
                     u'totalRevenue': {u'fmt': u'31.76M',
                                       u'longFmt': u'31,764,000',
                                       u'raw': 31764000}}}

你想要的是data[u'financialData']

 pp(data[u'financialData'])

 {u'currentPrice': {u'fmt': u'3.41', u'raw': 3.4134},
 u'currentRatio': {u'fmt': u'1.97', u'raw': 1.974},
 u'debtToEquity': {},
 u'earningsGrowth': {u'fmt': u'-33.30%', u'raw': -0.333},
 u'ebitda': {u'fmt': u'5.5M', u'longFmt': u'5,501,000', u'raw': 5501000},
 u'ebitdaMargins': {u'fmt': u'17.32%', u'raw': 0.17318001},
 u'freeCashflow': {u'fmt': u'4.06M',
                   u'longFmt': u'4,062,250',
                   u'raw': 4062250},
 u'grossMargins': {u'fmt': u'79.29%', u'raw': 0.79288},
 u'grossProfits': {u'fmt': u'25.17M',
                   u'longFmt': u'25,172,000',
                   u'raw': 25172000},
 u'maxAge': 86400,
 u'numberOfAnalystOpinions': {},
 u'operatingCashflow': {u'fmt': u'6.85M',
                        u'longFmt': u'6,853,000',
                        u'raw': 6853000},
 u'operatingMargins': {u'fmt': u'16.47%', u'raw': 0.16465001},
 u'profitMargins': {u'fmt': u'12.02%', u'raw': 0.12023},
 u'quickRatio': {u'fmt': u'1.92', u'raw': 1.917},
 u'recommendationKey': u'strong_buy',
 u'recommendationMean': {u'fmt': u'1.00', u'raw': 1.0},
 u'returnOnAssets': {u'fmt': u'7.79%', u'raw': 0.07793},
 u'returnOnEquity': {u'fmt': u'15.05%', u'raw': 0.15054},
 u'revenueGrowth': {u'fmt': u'5.00%', u'raw': 0.05},
 u'revenuePerShare': {u'fmt': u'1.51', u'raw': 1.513},
 u'targetHighPrice': {},
 u'targetLowPrice': {},
 u'targetMeanPrice': {},
 u'targetMedianPrice': {},
 u'totalCash': {u'fmt': u'20.28M',
                u'longFmt': u'20,277,000',
                u'raw': 20277000},
 u'totalCashPerShare': {u'fmt': u'0.96', u'raw': 0.957},
 u'totalDebt': {u'fmt': None, u'longFmt': u'0', u'raw': 0},
 u'totalRevenue': {u'fmt': u'31.76M',
                   u'longFmt': u'31,764,000',
                   u'raw': 31764000}}

您可以在其中看到u'currentRatio',fmt是您在网站上看到的格式化输出,格式化为两位小数。所以要获得1.97:

In [5]: import requests
   ...: data = {"formatted": "true",
   ...:         "crumb": "AKV/cl0TOgz",
   ...:         "lang": "en-US",
   ...:         "region": "US",
   ...:         "modules": "defaultKeyStatistics,financialData,calendarEvents",
   ...:         "corsDomain": "finance.yahoo.com"}
   ...: r = requests.get("https://query1.finance.yahoo.com/v10/finance/quoteSumm
   ...: ary/GSB", params=data)
   ...: data = r.json()[u'quoteSummary']["result"][0][u'financialData']
   ...: ratio = data[u'currentRatio']
   ...: print(ratio)
   ...: print(ratio["fmt"])
   ...: 
{'raw': 1.974, 'fmt': '1.97'}
1.97

使用 urllib 的等效代码:

In [1]: import urllib
   ...: from urllib import urlencode
   ...: from json import load
   ...: 
   ...: 
   ...: data = {"formatted": "true",
   ...:         "crumb": "AKV/cl0TOgz",
   ...:         "lang": "en-US",
   ...:         "region": "US",
   ...:         "modules": "defaultKeyStatistics,financialData,calendarEvents",
   ...:         "corsDomain": "finance.yahoo.com"}
   ...: url = "https://query1.finance.yahoo.com/v10/finance/quoteSummary/GSB"
   ...: r = urllib.urlopen(url, data=urlencode(data))
   ...: data = load(r)[u'quoteSummary']["result"][0][u'financialData']
   ...: ratio = data[u'currentRatio']
   ...: print(ratio)
   ...: print(ratio["fmt"])
   ...: 
{u'raw': 1.974, u'fmt': u'1.97'}
1.97

它也适用于APPL:

In [1]: import urllib
   ...: from urllib import urlencode
   ...: from json import load
   ...: data = {"formatted": "true",
   ...:         "lang": "en-US",
   ...:         "region": "US",
   ...:         "modules": "defaultKeyStatistics,financialData,calendarEvents",
   ...:         "corsDomain": "finance.yahoo.com"}
   ...: url = "https://query1.finance.yahoo.com/v10/finance/quoteSummary/AAPL"
   ...: r = urllib.urlopen(url, data=urlencode(data))
   ...: data = load(r)[u'quoteSummary']["result"][0][u'financialData']
   ...: ratio = data[u'currentRatio']
   ...: print(ratio)
   ...: print(ratio["fmt"])
   ...: 
{u'raw': 1.312, u'fmt': u'1.31'}
1.31

添加crumb参数似乎没有任何效果,如果您需要在以后获取它:

soup = BeautifulSoup(urllib.urlopen("http://finance.yahoo.com/quote/GSB/key-statistics?p=GSB").read())
script = soup.find("script", text=re.compile("root.App.main")).text
data = loads(re.search("root.App.main\s+=\s+(\{.*\})", script).group(1))
print(data["context"]["dispatcher"]["stores"]["CrumbStore"]["crumb"])

对于市值,您需要添加 summaryDetail 模块:

In [1]: import requests
   ...: 
   ...: params = {"formatted": "true",
   ...:           "crumb": "AKV/cl0TOgz",  # works without so not sure of signif
   ...: icance
   ...:           "lang": "en-US",
   ...:           "region": "US",
   ...:           "modules": "summaryDetail",
   ...:           "corsDomain": "finance.yahoo.com"}
   ...: 
   ...: r = requests.get("https://query1.finance.yahoo.com/v10/finance/quoteSumm
   ...: ary/GOOG", params=params)
   ...: data = r.json()[u'quoteSummary']["result"][0]
   ...: print(data["summaryDetail"]["marketCap"])
   ...: 
{'raw': 769972436992, 'fmt': '769.97B', 'longFmt': '769,972,436,992'}

我所知道的可用模块是:

defaultKeyStatistics
financialData
calendarEvents
assetProfile
summaryDetail
upgradeDowngradeHistory
recommendationTrend
earnings
price

答案 1 :(得分:3)

我添加到Padriac的答案中的一件事是除了KeyErrors之外,因为你可能会抓不止一个自动收报机。

import requests
a = requests.get('https://query2.finance.yahoo.com/v10/finance/quoteSummary/GSB?formatted=true&crumb=A7e5%2FXKKAFa&lang=en-US&region=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com')
b = a.json()
try:
    ratio = b['quoteSummary']['result'][0]['financialData']['currentRatio']['raw']
    print(ratio) #prints 1.974
except (IndexError, KeyError):
    pass

这样做很酷的一点是,您可以轻松更改所需信息的密钥。一种很好的方法来查看字典嵌套在Yahoo!上的方式!财务页面将使用pprint。此外,对于具有季度信息的页面,只需将[0]更改为[1]即可获取第二季度的信息而不是第一季度......依此类推。

答案 2 :(得分:0)

这是另一个使用Excel的解决方案。

http://www.financialwisdomforum.org/gummy-stuff/Yahoo-data.htm

enter image description here

enter image description here

从该网站上的众多链接之一下载示例工作簿。这将做你想要的一切,还有更多。

答案 3 :(得分:-1)

也许这不是您正在寻找的答案,但R可以非常轻松而快速地完成此任务。请参阅以下链接。

http://allthingsr.blogspot.com/2012/10/pull-yahoo-finance-key-statistics.html

#######################################################################
# Script to download key metrics for a set of stock tickers using the quantmod package
#######################################################################
require(quantmod)
require("plyr")
what_metrics <- yahooQF(c("Price/Sales", 
                          "P/E Ratio",
                          "Price/EPS Estimate Next Year",
                          "PEG Ratio",
                          "Dividend Yield", 
                          "Market Capitalization"))

tickers <- c("AAPL", "FB", "GOOG", "HPQ", "IBM", "MSFT", "ORCL", "SAP")
# Not all the metrics are returned by Yahoo.
metrics <- getQuote(paste(tickers, sep="", collapse=";"), what=what_metrics)

#Add tickers as the first column and remove the first column which had date stamps
metrics <- data.frame(Symbol=tickers, metrics[,2:length(metrics)]) 

#Change colnames
colnames(metrics) <- c("Symbol", "Revenue Multiple", "Earnings Multiple", 
                       "Earnings Multiple (Forward)", "Price-to-Earnings-Growth", "Div Yield", "Market Cap")

#Persist this to the csv file
write.csv(metrics, "FinancialMetrics.csv", row.names=FALSE)

#######################################################################

#######################################################################
##Alternate method to download all key stats using XML and x_path - PREFERRED WAY
#######################################################################

setwd("C:/Users/i827456/Pictures/Blog/Oct-25")
require(XML)
require(plyr)
getKeyStats_xpath <- function(symbol) {
  yahoo.URL <- "http://finance.yahoo.com/q/ks?s="
  html_text <- htmlParse(paste(yahoo.URL, symbol, sep = ""), encoding="UTF-8")

  #search for <td> nodes anywhere that have class 'yfnc_tablehead1'
  nodes <- getNodeSet(html_text, "/*//td[@class='yfnc_tablehead1']")

  if(length(nodes) > 0 ) {
   measures <- sapply(nodes, xmlValue)

   #Clean up the column name
   measures <- gsub(" *[0-9]*:", "", gsub(" \\(.*?\\)[0-9]*:","", measures))   

   #Remove dups
   dups <- which(duplicated(measures))
   #print(dups) 
   for(i in 1:length(dups)) 
     measures[dups[i]] = paste(measures[dups[i]], i, sep=" ")

   #use siblings function to get value
   values <- sapply(nodes, function(x)  xmlValue(getSibling(x)))

   df <- data.frame(t(values))
   colnames(df) <- measures
   return(df)
  } else {
    break
  }
}

tickers <- c("AAPL")
stats <- ldply(tickers, getKeyStats_xpath)
rownames(stats) <- tickers
write.csv(t(stats), "FinancialStats_updated.csv",row.names=TRUE)  

#######################################################################

enter image description here