如何从Yahoo!抓取关键统计数据用R融资?

时间:2018-12-30 18:35:07

标签: r web-scraping rvest quantmod quandl

不幸的是,我还不是经验丰富的刮板手。但是,我需要使用R从Yahoo Finance抓取多只股票的关键统计数据。

我对使用rvest软件包中的read_html,html_nodes()和html_text()直接从html抓取数据有些熟悉。但是,此网页的MSFT关键统计信息有些复杂,我不确定是否所有统计信息都保存在XHR,JS或Doc中。我猜数据存储在JSON中。

如果有人知道使用R提取和解析此网页数据的好方法,请回答我的问题,在此先感谢您!

或者,如果有一种更便捷的方法通过quantmod或Quandl提取这些指标,请告诉我,那将是一个非常好的解决方案!

目标是将票证/符号作为行名/行标签,而将统计信息标识为列。可以在以下Finviz链接中找到我的需求的说明:

https://finviz.com/screener.ashx

我之所以要抓取Yahoo Finance数据,是因为Yahoo还考虑了企业EBITDA的主要统计数据。

编辑: 我打算参考关键统计信息页面。例如:。https://finance.yahoo.com/quote/MSFT/key-statistics/。该代码应导致一个数据框上的股票代码行和关键统计信息列。

3 个答案:

答案 0 :(得分:1)

我希望这是您要寻找的东西:

library(quantmod)
library(plyr)

what_metrics <- yahooQF(c("Price/Sales", 
                          "P/E Ratio",
                          "Price/EPS Estimate Next Year",
                          "PEG Ratio",
                          "Dividend Yield", 
                          "Market Capitalization"))

Symbols<-c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")


metrics <- getQuote(paste(Symbols, sep="", collapse=";"), what=what_metrics)

获取指标列表

yahooQF()

答案 1 :(得分:0)

代码

library(rvest)
library(tidyverse)

# Define stock name
stock <- "MSFT"

# Extract and transform data
df <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>% 
    read_html() %>% 
    html_table() %>% 
    map_df(bind_cols) %>% 
    # Transpose
    t() %>%
    as_tibble()

# Set first row as column names
colnames(df) <- df[1,]
# Remove first row
df <- df[-1,]
# Add stock name column
df$Stock_Name <- stock

结果

  Revenue `Total Revenue` `Cost of Revenu… `Gross Profit`
  <chr>   <chr>           <chr>            <chr>         
1 6/30/2… 110,360,000     38,353,000       72,007,000    
2 6/30/2… 96,571,000      33,850,000       62,721,000    
3 6/30/2… 91,154,000      32,780,000       58,374,000    
4 6/30/2… 93,580,000      33,038,000       60,542,000    
# ... with 25 more variables: ...

编辑:
或者,为方便起见,作为功能:

get_yahoo <- function(stock){
  # Extract and transform data
  x <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>% 
    read_html() %>% 
    html_table() %>% 
    map_df(bind_cols) %>% 
    # Transpose
    t() %>%
    as_tibble()

  # Set first row as column names
  colnames(x) <- x[1,]
  # Remove first row
  x <- x[-1,]
  # Add stock name column
  x$Stock_Name <- stock

  return(x)
}

用法:get_yahoo(stock)

答案 2 :(得分:0)

您可以使用lapply来获得一个以上的桩头

library(quantmod) 

Symbols<-c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")

StartDate <- as.Date('2015-01-01')

Stocks <-  lapply(Symbols, function(sym) {
  Cl(na.omit(getSymbols(sym, from=StartDate, auto.assign=FALSE)))
})

Stocks <- do.call(merge, Stocks)

在这种情况下,我在函数Cl()中得到了收盘价