从Yahoo Finance搜集财务数据

时间:2019-10-27 19:36:15

标签: r web-scraping

我一直在尝试使用R从Yahoo Finance抓取财务数据,但未能成功。您可以在下面查看我当前的代码。主要的问题似乎是,在Yahoo Finance中存储财务数据的表没有被建模为HTML代码中的表。我该如何绕过这个问题?

我已经尝试过复制似乎没有运气的Xpath。

library(XML)

symbol = "HD"
url <- paste('https://finance.yahoo.com/quote/HD/financials?p=',symbol,sep="")
webpage <- readLines(url)
html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
tableNodes <- getNodeSet(html, "//table")

data <- readHTMLTable(tableNodes)

1 个答案:

答案 0 :(得分:0)

我曾经使用过Yahoo Finance,您在犯一个小错误,因为tableNodes可以包含多个表,因此请使用以下表获取所有表:

library(XML)

symbol = "HD"
url <- paste('https://finance.yahoo.com/quote/HD/analysts?p=',symbol,sep="")
webpage <- readLines(url)
html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
tableNodes <- getNodeSet(html, "//table")

earningsEstimates <- readHTMLTable(tableNodes[[1]])
revenueEstimates <- readHTMLTable(tableNodes[[2]])
earningsHistory <- readHTMLTable(tableNodes[[3]])
earningPerShareTrend <- readHTMLTable(tableNodes[[4]])
earningPerShareRevision <- readHTMLTable(tableNodes[[5]])
growthEstimates <- readHTMLTable(tableNodes[[6]])

print(earningsEstimates) # printing one table

输出

 Earnings Estimate Current Qtr. (Oct 2019) Next Qtr. (Jan 2020) Current Year (2020)
1   No. of Analysts                      28                   28                  35
2     Avg. Estimate                    2.52                 2.17               10.13
3      Low Estimate                    2.47                 2.07               10.03
4     High Estimate                    2.58                 2.24               10.27
5      Year Ago EPS                    2.51                 2.25                9.89
  Next Year (2021)
1               35
2            10.96
3             10.7
4             11.2
5            10.13