不幸的是,我还不是经验丰富的刮板手。但是,我需要使用R从Yahoo Finance抓取多只股票的关键统计数据。
我对使用rvest软件包中的read_html,html_nodes()和html_text()直接从html抓取数据有些熟悉。但是,此网页的MSFT关键统计信息有些复杂,我不确定是否所有统计信息都保存在XHR,JS或Doc中。我猜数据存储在JSON中。
如果有人知道使用R提取和解析此网页数据的好方法,请回答我的问题,在此先感谢您!
或者,如果有一种更便捷的方法通过quantmod或Quandl提取这些指标,请告诉我,那将是一个非常好的解决方案!
目标是将票证/符号作为行名/行标签,而将统计信息标识为列。可以在以下Finviz链接中找到我的需求的说明:
https://finviz.com/screener.ashx
我之所以要抓取Yahoo Finance数据,是因为Yahoo还考虑了企业EBITDA的主要统计数据。
编辑: 我打算参考关键统计信息页面。例如:。https://finance.yahoo.com/quote/MSFT/key-statistics/。该代码应导致一个数据框上的股票代码行和关键统计信息列。
答案 0 :(得分:1)
我希望这是您要寻找的东西:
library(quantmod)
library(plyr)
what_metrics <- yahooQF(c("Price/Sales",
"P/E Ratio",
"Price/EPS Estimate Next Year",
"PEG Ratio",
"Dividend Yield",
"Market Capitalization"))
Symbols<-c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")
metrics <- getQuote(paste(Symbols, sep="", collapse=";"), what=what_metrics)
获取指标列表
yahooQF()
答案 1 :(得分:0)
library(rvest)
library(tidyverse)
# Define stock name
stock <- "MSFT"
# Extract and transform data
df <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>%
read_html() %>%
html_table() %>%
map_df(bind_cols) %>%
# Transpose
t() %>%
as_tibble()
# Set first row as column names
colnames(df) <- df[1,]
# Remove first row
df <- df[-1,]
# Add stock name column
df$Stock_Name <- stock
Revenue `Total Revenue` `Cost of Revenu… `Gross Profit`
<chr> <chr> <chr> <chr>
1 6/30/2… 110,360,000 38,353,000 72,007,000
2 6/30/2… 96,571,000 33,850,000 62,721,000
3 6/30/2… 91,154,000 32,780,000 58,374,000
4 6/30/2… 93,580,000 33,038,000 60,542,000
# ... with 25 more variables: ...
编辑:
或者,为方便起见,作为功能:
get_yahoo <- function(stock){
# Extract and transform data
x <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>%
read_html() %>%
html_table() %>%
map_df(bind_cols) %>%
# Transpose
t() %>%
as_tibble()
# Set first row as column names
colnames(x) <- x[1,]
# Remove first row
x <- x[-1,]
# Add stock name column
x$Stock_Name <- stock
return(x)
}
用法:get_yahoo(stock)
答案 2 :(得分:0)
您可以使用lapply来获得一个以上的桩头
library(quantmod)
Symbols<-c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")
StartDate <- as.Date('2015-01-01')
Stocks <- lapply(Symbols, function(sym) {
Cl(na.omit(getSymbols(sym, from=StartDate, auto.assign=FALSE)))
})
Stocks <- do.call(merge, Stocks)
在这种情况下,我在函数Cl()中得到了收盘价