quantmod getFinancials()不会拉动财务

时间:2017-08-01 11:16:26

标签: r quantmod


对于JPM: 在雅虎财经网站上,我确实看到财务状况已填充,但以下调用似乎将"google"拉为src而不是"yahoo",其中填充了稀疏财务。

Google - https://www.google.com/finance?q=NYSE%3AJPM&fstype=ii&ei=9kh-WejLE5e_etbzmpgP

雅虎 - https://finance.yahoo.com/quote/JPM/financials?p=JPM

JPM <- getFinancials("JPM", src = "yahoo", auto.assign = FALSE)
viewFin(JPM, type = "IS", period = "A")


2 个答案:

目前无法将Yahoo Finance指定为来源。这样做需要有人写一个方法来从雅虎财经中搜索和解析HTML,因为没有办法将它下载到像价格数据这样的文件中。

# assumes codes are known beforehand
codes <- c("MSFT","SBUX","S","AAPL","ADT")
urls <- paste0("https://www.google.com/finance/historical?q=",codes,"&output=csv")
paths <- paste0(codes,"csv")
missing <- !(paths %in% dir(".", full.name = TRUE))

# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
# remove file if exists already
if(file.exists(path)) file.remove(path)
# download file
download.file(url, path, ...), error = function(c) {
# remove file if error
if(file.exists(path)) file.remove(path)
# create error message
c$message <- paste(substr(path, 1, 4),"failed")
# wrapper of mapply
Map(downloadFile, urls[missing], paths[missing])


## downloads historic prices for all constituents of SP500

## read in list of constituents, with company name in first column and
## ticker symbol in second column

spComp <- read.csv("C:/Users/Excel/Desktop/stocks.csv" ) 

## specify time period
dateStart <- "2013-01-01"               
dateEnd <- "2015-05-08"

## extract symbols and number of iterations
symbols <- spComp[, 1]
nAss <- length(symbols)

## download data on first stock as zoo object
z <- get.hist.quote(instrument = symbols[1], start = dateStart,
                    end = dateEnd, quote = "AdjClose",
                    retclass = "zoo", quiet = T)

## use ticker symbol as column name 
dimnames(z)[[2]] <- as.character(symbols[1])

## download remaining assets in for loop
for (i in 2:nAss) {
   ## display progress by showing the current iteration step
   cat("Downloading ", i, " out of ", nAss , "\n")

   result <- try(x <- get.hist.quote(instrument = symbols[i],
                                     start = dateStart,
                                     end = dateEnd, quote = "AdjClose",
                                     retclass = "zoo", quiet = T))
   if(class(result) == "try-error") {
   else {
      dimnames(x)[[2]] <- as.character(symbols[i])

      ## merge with already downloaded data to get assets on same dates 
      z <- merge(z, x)                      



## save data
write.zoo(z, file = "C:/Users/Excel/Desktop/all_sp500_price_data.csv", index.name = "time")


Method #1:
This article illustrates how to download stock price data files from Google, save it into a local drive and merge them into a single data frame.  

First of all, the following three packages are used.

The script begins with creating a folder to save data files.

# create data folder
dataDir <- paste0("data","_","2014-11-20-Download-Stock-Data-1")
if(file.exists(dataDir)) { 
      unlink(dataDir, recursive = TRUE)
} else {
After creating urls and file paths, files are downloaded using `Map` function - it is a warpper of `mapply`. Note that, in case the function breaks by an error (eg when a file doesn't exist), `download.file` is wrapped by another function that includes an error handler (`tryCatch`). 

# assumes codes are known beforehand
codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
paths <- paste0(dataDir,"/",codes,".csv") # back slash on windows (\\)

# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
      # remove file if exists already
      if(file.exists(path)) file.remove(path)
      # download file
            download.file(url, path, ...), error = function(c) {
                  # remove file if error
                  if(file.exists(path)) file.remove(path)
                  # create error message
                  c$message <- paste(substr(path, 1, 4),"failed")
# wrapper of mapply
Map(downloadFile, urls, paths)
Finally files are read back using `llply` and they are combined using `rbind_all`. Note that, as the merged data has multiple stocks' records, `Code` column is created.

# read all csv files and merge
files <- dir(dataDir, full.name = TRUE)
dataList <- llply(files, function(file){
      data <- read.csv(file, stringsAsFactors = FALSE)
      # get code from file path
      pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
      code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
      # first column's name is funny
      names(data) <- c("Date","Open","High","Low","Close","Volume")
      data$Date <- dmy(data$Date)
      data$Open <- as.numeric(data$Open)
      data$High <- as.numeric(data$High)
      data$Low <- as.numeric(data$Low)
      data$Close <- as.numeric(data$Close)
      data$Volume <- as.integer(data$Volume)
      data$Code <- code
}, .progress = "text")

data <- rbind_all(dataList)
Some of the values are shown below.

|Date       |  Open|  High|   Low| Close|   Volume|Code |
|2014-11-26 | 47.49| 47.99| 47.28| 47.75| 27164877|MSFT |
|2014-11-25 | 47.66| 47.97| 47.45| 47.47| 28007993|MSFT |
|2014-11-24 | 47.99| 48.00| 47.39| 47.59| 35434245|MSFT |
|2014-11-21 | 49.02| 49.05| 47.57| 47.98| 42884795|MSFT |
|2014-11-20 | 48.00| 48.70| 47.87| 48.70| 21510587|MSFT |
|2014-11-19 | 48.66| 48.75| 47.93| 48.22| 26177450|MSFT |

This way wouldn't be efficient compared to the way where files are read directly without being saved into a local drive. This option may be useful, however, if files are large and the API server breaks connection abrubtly.

I hope this article is useful and I'm going to write an article to show the second way.

I hope this article is useful and I'm going to write an article to show the second way.

Method #2:
In an earlier article, a way to download stock price data files from Google, save it into a local drive and merge them into a single data frame. If files are not large, however, it wouldn't be effective and, in this article, files are downloaded and merged internally.

The following packages are used.

Taking urls as file locations, files are directly read using `llply` and they are combined using `rbind_all`. As the merged data has multiple stocks' records, `Code` column is created. Note that, when an error occurrs, the function returns a dummy data frame in order not to break the loop - values of the dummy data frame(s) are filtered out at the end.

# assumes codes are known beforehand
codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
files <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",

dataList <- llply(files, function(file, ...) {
      # get code from file url
      pattern <- "Q:[0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z]"
      code <- substr(str_extract(file, pattern), 3, nchar(str_extract(file, pattern)))

      # read data directly from a URL with only simple error handling
      # for further error handling: http://adv-r.had.co.nz/Exceptions-Debugging.html
            data <- read.csv(file, stringsAsFactors = FALSE)
            # first column's name is funny
            names(data) <- c("Date","Open","High","Low","Close","Volume")
            data$Date <- dmy(data$Date)
            data$Open <- as.numeric(data$Open)
            data$High <- as.numeric(data$High)
            data$Low <- as.numeric(data$Low)
            data$Close <- as.numeric(data$Close)
            data$Volume <- as.integer(data$Volume)
            data$Code <- code
      error = function(c) {
            c$message <- paste(code,"failed")
            # return a dummy data frame
            data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Open=0, High=0,
                               Low=0, Close=0, Volume=0, Code="NA")

# dummy data frame values are filtered out
data <- filter(rbind_all(dataList), Code != "NA")
Some of the values are shown below.

|Date       |  Open|  High|   Low| Close|   Volume|Code |
|2014-11-26 | 47.49| 47.99| 47.28| 47.75| 27164877|MSFT |
|2014-11-25 | 47.66| 47.97| 47.45| 47.47| 28007993|MSFT |
|2014-11-24 | 47.99| 48.00| 47.39| 47.59| 35434245|MSFT |
|2014-11-21 | 49.02| 49.05| 47.57| 47.98| 42884795|MSFT |
|2014-11-20 | 48.00| 48.70| 47.87| 48.70| 21510587|MSFT |
|2014-11-19 | 48.66| 48.75| 47.93| 48.22| 26177450|MSFT |

It took a bit longer to complete the script as I had to teach myself how to handle errors in R. And this is why I started to write articles in this blog.

I hope this article is useful.

I hope this article is useful.

Summarize Stock returns From Multiple Files:
This is a slight extension of the previous two articles and it aims to produce gross returns, standard deviation and correlation of multiple shares.

The following packages are used.

The script begins with creating a data folder in the format of *data_YYYY-MM-DD*.

# create data folder
dataDir <- paste0("data","_",format(Sys.Date(),"%Y-%m-%d"))
if(file.exists(dataDir)) {
  unlink(dataDir, recursive = TRUE)
} else {
Given company codes, URLs and file paths are created. Then data files are downloaded by `Map`, which is a wrapper of `mapply`. Note that R's `download.file` function is wrapped by `downloadFile` so that the function does not break when an error occurs.

# assumes codes are known beforehand
codes <- c("MSFT", "TCHC")
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
paths <- paste0(dataDir,"/",codes,".csv") # backward slash on windows (\)

# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
  # remove file if exists already
  if(file.exists(path)) file.remove(path)
  # download file
    download.file(url, path, ...), error = function(c) {
      # remove file if error
      if(file.exists(path)) file.remove(path)
      # create error message
      c$message <- paste(substr(path, 1, 4),"failed")
# wrapper of mapply
Map(downloadFile, urls, paths)
Once the files are downloaded, they are read back to combine using `rbind_all`. Some more details about this step is listed below.

* only Date, Close and Code columns are taken
* codes are extracted from file paths by matching a regular expression
* data is arranged by date as the raw files are sorted in a descending order
* error is handled by returning a dummy data frame where its code value is NA.
* individual data files are merged in a long format
    * 'NA' is filtered out

# read all csv files and merge
files <- dir(dataDir, full.name = TRUE)
dataList <- llply(files, function(file){
  # get code from file path
  pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
  code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
    data <- read.csv(file, stringsAsFactors = FALSE)
    # first column's name is funny
    names(data) <- c("Date","Open","High","Low","Close","Volume")
    data$Date <- dmy(data$Date)
    data$Close <- as.numeric(data$Close)
    data$Code <- code
    # optional
    data$Open <- as.numeric(data$Open)
    data$High <- as.numeric(data$High)
    data$Low <- as.numeric(data$Low)
    data$Volume <- as.integer(data$Volume)
    # select only 'Date', 'Close' and 'Code'
    # raw data should be arranged in an ascending order
    arrange(subset(data, select = c(Date, Close, Code)), Date)
  error = function(c){
    c$message <- paste(code,"failed")
    # return a dummy data frame not to break function
    data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Close=0, Code="NA")
}, .progress = "text")

# data is combined to create a long format
# dummy data frame values are filtered out
data <- filter(rbind_all(dataList), Code != "NA")
Some values of this long format data is shown below.

|Date       | Close|Code |
|2013-11-29 | 38.13|MSFT |
|2013-12-02 | 38.45|MSFT |
|2013-12-03 | 38.31|MSFT |
|2013-12-04 | 38.94|MSFT |
|2013-12-05 | 38.00|MSFT |
|2013-12-06 | 38.36|MSFT |

The data is converted into a wide format data where the x and y variables are Date and Code respectively (`Date ~ Code`) while the value variable is Close (`value.var="Close"`). Some values of the wide format data is shown below.

# data is converted into a wide format
data <- dcast(data, Date ~ Code, value.var="Close")
{% endhighlight %}

|Date       |  MSFT|  TCHC|
|2013-11-29 | 38.13| 13.52|
|2013-12-02 | 38.45| 13.81|
|2013-12-03 | 38.31| 13.48|
|2013-12-04 | 38.94| 13.71|
|2013-12-05 | 38.00| 13.55|
|2013-12-06 | 38.36| 13.95|

The remaining steps are just differencing close price values after taking log and applying `sum`, `sd`, and `cor`.

# select except for Date column
data <- select(data, -Date)

# apply log difference column wise
dailyRet <- apply(log(data), 2, diff, lag=1)

# obtain daily return, variance and correlation
returns <- apply(dailyRet, 2, sum, na.rm = TRUE)
std <- apply(dailyRet, 2, sd, na.rm = TRUE)
correlation <- cor(dailyRet)

{% endhighlight %}

## 0.2249777 0.6293973
{% endhighlight %}

## 0.01167381 0.03203031
{% endhighlight %}

## MSFT 1.0000000 0.1481043
## TCHC 0.1481043 1.0000000
{% endhighlight %}

Finally the data folder is deleted.

# delete data folder
if(file.exists(dataDir)) { unlink(dataDir, recursive = TRUE) }
{% endhighlight %}