我想从一个股票列表中创建一个来自n个公司的stockdata矩阵,虽然我很难横向追加它们,但它只能垂直附加它们。 还有其他函数,比如我尝试过的merge或rbind,但是它们无法使用解析为字符串的变量,所以这里的难点在于我想要附加n个变量,这些变量是从具有n个库存的股票列表中检索的。欢迎其他建议,以获得相同的结果。 库存清单数据:
> dput(stockslist)
structure(list(V1 = c("AMD", "MSFT", "SBUX", "IBM", "AAPL", "GSPC",
"AMZN")), .Names = "V1", class = "data.frame", row.names = c(NA,
-7L))
代码:
library(quantmod)
library(tseries)
library(plyr)
library(PortfolioAnalytics)
library(PerformanceAnalytics)
library(zoo)
library(plotly)
tickerlist <- "sp500.csv" #CSV containing tickers on rows
stockslist <- read.csv("sp500.csv", header = FALSE, stringsAsFactors = F)
nrstocks = length(stockslist[,1]) #The number of stocks to download
maxretryattempts <- 5 #If there is an error downloading a price how many
times to retry
startDate = as.Date("2010-01-13")
for (i in 1:nrstocks) {
stockdata <- getSymbols(c(stockslist[i,1]), src = "yahoo", from =
startDate)
# pick 6th column of the ith stock
write.table((eval(parse(text=paste(stockslist[i,1]))))[,6], file =
"test.csv", append = TRUE, row.names=F)
}
答案 0 :(得分:0)
这是谈论lists of dataframes的绝佳机会。说完了......
边栏:我真的不喜欢副作用。 getSymbols
默认使用副作用将数据保存到父框架/环境中,虽然这对于大多数用途可能没问题,但我更喜欢功能方法。幸运的是,使用auto.assign=FALSE
将其行为恢复到我的安慰范围内。
library(quantmod)
stocklist <- c("AMD", "MSFT")
startDate <- as.Date("2010-01-13")
dat <- sapply(stocklist, getSymbols, src = "google", from = startDate, auto.assign = FALSE,
simplify = FALSE)
str(dat)
# List of 2
# $ AMD :An 'xts' object on 2010-01-13/2017-05-16 containing:
# Data: num [1:1846, 1:5] 8.71 9.18 9.13 8.84 8.98 9.01 8.55 8.01 8.03 8.03 ...
# - attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:5] "AMD.Open" "AMD.High" "AMD.Low" "AMD.Close" ...
# Indexed by objects of class: [Date] TZ: UTC
# xts Attributes:
# List of 2
# ..$ src : chr "google"
# ..$ updated: POSIXct[1:1], format: "2017-05-16 21:01:37"
# $ MSFT:An 'xts' object on 2010-01-13/2017-05-16 containing:
# Data: num [1:1847, 1:5] 30.3 30.3 31.1 30.8 30.8 ...
# - attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:5] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ...
# Indexed by objects of class: [Date] TZ: UTC
# xts Attributes:
# List of 2
# ..$ src : chr "google"
# ..$ updated: POSIXct[1:1], format: "2017-05-16 21:01:37"
虽然我只做了两个符号,但它应该可以使用更多的符号。此外,由于雅虎要求进行身份验证,我转而使用谷歌。
您使用write.csv(...)
,意识到您将丢失每个数据的时间戳,因为CSV将类似于:
"AMD.Open","AMD.High","AMD.Low","AMD.Close","AMD.Volume"
8.71,9.2,8.55,9.15,32741845
9.18,9.26,8.92,9,22658744
9.13,9.19,8.8,8.84,34344763
8.84,9.21,8.84,9.01,24875646
以“AMD”为例,请考虑:
write.csv(as.data.frame(AMD), file="AMD.csv", row.names = TRUE)
head(read.csv("~/Downloads/AMD.csv", row.names = 1))
# AMD.Open AMD.High AMD.Low AMD.Close AMD.Volume
# 2010-01-13 8.71 9.20 8.55 9.15 32741845
# 2010-01-14 9.18 9.26 8.92 9.00 22658744
# 2010-01-15 9.13 9.19 8.80 8.84 34344763
# 2010-01-19 8.84 9.21 8.84 9.01 24875646
# 2010-01-20 8.98 9.00 8.76 8.87 22813520
# 2010-01-21 9.01 9.10 8.77 8.99 37888647
一次保存所有这些:
ign <- mapply(function(x, fn) write.csv(as.data.frame(x), file = fn, row.names = TRUE),
dat, names(dat))
还有其他方法可以存储您的数据,例如Rdata
个文件(save()
)。
我不清楚您是否打算将它们作为附加列(即cbind
行为)或行(rbind
)附加。在两者之间,我倾向于“行”,但我首先从“列”开始。
如果您想要逐日比较(尽管有更好的方法可以为此做准备),这可能是合适的。你会遇到问题,因为他们有(并且很可能会有)不同数量的行:
sapply(dat, nrow)
# AMD MSFT
# 1846 1847
在这种情况下,您可能希望根据日期(行名称)加入。为了做到这一点,您应该将行名称(日期)转换为列,并在该列上转换merge
:
dat2 <- lapply(dat, function(x) {
x <- as.data.frame(x)
x$date <- rownames(x)
rownames(x) <- NULL
x
})
datwide <- Reduce(function(a, b) merge(a, b, by = "date", all = TRUE), dat2)
作为一个简单的演示,记住“MSFT”中还有一行而不是“AMD”,我们可以找到那行并证明事情仍然正常:
which(! complete.cases(datwide))
# [1] 1251
datwide[1251 + -2:2,]
# date AMD.Open AMD.High AMD.Low AMD.Close AMD.Volume MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
# 1249 2014-12-30 2.64 2.70 2.63 2.63 7783709 47.44 47.62 46.84 47.02 16384692
# 1250 2014-12-31 2.64 2.70 2.64 2.67 11177917 46.73 47.44 46.45 46.45 21552450
# 1251 2015-01-02 NA NA NA NA NA 46.66 47.42 46.54 46.76 27913852
# 1252 2015-01-05 2.67 2.70 2.64 2.66 8878176 46.37 46.73 46.25 46.32 39673865
# 1253 2015-01-06 2.65 2.66 2.55 2.63 13916645 46.38 46.75 45.54 45.65 36447854
getSymbols
将列表中唯一的列命名为轻微的挫败感。另外,由于我们将丢弃列名,因此我们应该在数据中保留符号名称。
dat3 <- lapply(dat, function(x) {
ticker <- gsub("\\..*", "", colnames(x)[1])
colnames(x) <- gsub(".*\\.", "", colnames(x))
x <- as.data.frame(x)
x$date <- rownames(x)
x$symbol <- ticker
rownames(x) <- NULL
x
}) # can also be accomplished with mapply(..., dat, names(dat))
datlong <- Reduce(function(a, b) rbind(a, b, make.row.names = FALSE), dat3)
head(datlong)
# Open High Low Close Volume date symbol
# 1 8.71 9.20 8.55 9.15 32741845 2010-01-13 AMD
# 2 9.18 9.26 8.92 9.00 22658744 2010-01-14 AMD
# 3 9.13 9.19 8.80 8.84 34344763 2010-01-15 AMD
# 4 8.84 9.21 8.84 9.01 24875646 2010-01-19 AMD
# 5 8.98 9.00 8.76 8.87 22813520 2010-01-20 AMD
# 6 9.01 9.10 8.77 8.99 37888647 2010-01-21 AMD
nrow(datlong)
# [1] 3693