如何在循环中在R中水平追加列?

时间:2017-05-17 00:01:09

标签: r

我想从一个股票列表中创建一个来自n个公司的stockdata矩阵,虽然我很难横向追加它们,但它只能垂直附加它们。 还有其他函数,比如我尝试过的merge或rbind,但是它们无法使用解析为字符串的变量,所以这里的难点在于我想要附加n个变量,这些变量是从具有n个库存的股票列表中检索的。欢迎其他建议,以获得相同的结果。 库存清单数据:

> dput(stockslist)
structure(list(V1 = c("AMD", "MSFT", "SBUX", "IBM", "AAPL", "GSPC", 
"AMZN")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-7L))

代码:

library(quantmod)
library(tseries)
library(plyr)
library(PortfolioAnalytics)
library(PerformanceAnalytics)
library(zoo)
library(plotly)

tickerlist <- "sp500.csv"  #CSV containing tickers on rows
stockslist <- read.csv("sp500.csv", header = FALSE, stringsAsFactors = F)
nrstocks = length(stockslist[,1]) #The number of stocks to download
maxretryattempts <- 5 #If there is an error downloading a price how many 
times to retry
startDate = as.Date("2010-01-13")

for (i in 1:nrstocks) {
  stockdata <- getSymbols(c(stockslist[i,1]), src = "yahoo", from = 
startDate)
  # pick 6th column of the ith stock
  write.table((eval(parse(text=paste(stockslist[i,1]))))[,6], file = 
"test.csv", append = TRUE, row.names=F)

}

1 个答案:

答案 0 :(得分:0)

这是谈论lists of dataframes的绝佳机会。说完了......

边栏:我真的不喜欢副作用。 getSymbols默认使用副作用将数据保存到父框架/环境中,虽然这对于大多数用途可能没问题,但我更喜欢功能方法。幸运的是,使用auto.assign=FALSE将其行为恢复到我的安慰范围内。

library(quantmod)
stocklist <- c("AMD", "MSFT")
startDate <- as.Date("2010-01-13")

dat <- sapply(stocklist, getSymbols, src = "google", from = startDate, auto.assign = FALSE,
              simplify = FALSE)
str(dat)
# List of 2
#  $ AMD :An 'xts' object on 2010-01-13/2017-05-16 containing:
#   Data: num [1:1846, 1:5] 8.71 9.18 9.13 8.84 8.98 9.01 8.55 8.01 8.03 8.03 ...
#  - attr(*, "dimnames")=List of 2
#   ..$ : NULL
#   ..$ : chr [1:5] "AMD.Open" "AMD.High" "AMD.Low" "AMD.Close" ...
#   Indexed by objects of class: [Date] TZ: UTC
#   xts Attributes:  
# List of 2
#   ..$ src    : chr "google"
#   ..$ updated: POSIXct[1:1], format: "2017-05-16 21:01:37"
#  $ MSFT:An 'xts' object on 2010-01-13/2017-05-16 containing:
#   Data: num [1:1847, 1:5] 30.3 30.3 31.1 30.8 30.8 ...
#  - attr(*, "dimnames")=List of 2
#   ..$ : NULL
#   ..$ : chr [1:5] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ...
#   Indexed by objects of class: [Date] TZ: UTC
#   xts Attributes:  
# List of 2
#   ..$ src    : chr "google"
#   ..$ updated: POSIXct[1:1], format: "2017-05-16 21:01:37"

虽然我只做了两个符号,但它应该可以使用更多的符号。此外,由于雅虎要求进行身份验证,我转而使用谷歌。

您使用write.csv(...),意识到您将丢失每个数据的时间戳,因为CSV将类似于:

"AMD.Open","AMD.High","AMD.Low","AMD.Close","AMD.Volume"
8.71,9.2,8.55,9.15,32741845
9.18,9.26,8.92,9,22658744
9.13,9.19,8.8,8.84,34344763
8.84,9.21,8.84,9.01,24875646

以“AMD”为例,请考虑:

write.csv(as.data.frame(AMD), file="AMD.csv", row.names = TRUE)
head(read.csv("~/Downloads/AMD.csv", row.names = 1))
#            AMD.Open AMD.High AMD.Low AMD.Close AMD.Volume
# 2010-01-13     8.71     9.20    8.55      9.15   32741845
# 2010-01-14     9.18     9.26    8.92      9.00   22658744
# 2010-01-15     9.13     9.19    8.80      8.84   34344763
# 2010-01-19     8.84     9.21    8.84      9.01   24875646
# 2010-01-20     8.98     9.00    8.76      8.87   22813520
# 2010-01-21     9.01     9.10    8.77      8.99   37888647

一次保存所有这些:

ign <- mapply(function(x, fn) write.csv(as.data.frame(x), file = fn, row.names = TRUE),
              dat, names(dat))

还有其他方法可以存储您的数据,例如Rdata个文件(save())。

我不清楚您是否打算将它们作为附加(即cbind行为)或行(rbind)附加。在两者之间,我倾向于“行”,但我首先从“列”开始。

按列

“追加”

如果您想要逐日比较(尽管有更好的方法可以为此做准备),这可能是合适的。你会遇到问题,因为他们有(并且很可能会有)不同数量的行:

sapply(dat, nrow)
#  AMD MSFT 
# 1846 1847 

在这种情况下,您可能希望根据日期(行名称)加入。为了做到这一点,您应该将行名称(日期)转换为列,并在该列上转换merge

dat2 <- lapply(dat, function(x) {
  x <- as.data.frame(x)
  x$date <- rownames(x)
  rownames(x) <- NULL
  x
})
datwide <- Reduce(function(a, b) merge(a, b, by = "date", all = TRUE), dat2)

作为一个简单的演示,记住“MSFT”中还有一行而不是“AMD”,我们可以找到那行并证明事情仍然正常:

which(! complete.cases(datwide))
# [1] 1251
datwide[1251 + -2:2,]
#            date AMD.Open AMD.High AMD.Low AMD.Close AMD.Volume MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
# 1249 2014-12-30     2.64     2.70    2.63      2.63    7783709     47.44     47.62    46.84      47.02    16384692
# 1250 2014-12-31     2.64     2.70    2.64      2.67   11177917     46.73     47.44    46.45      46.45    21552450
# 1251 2015-01-02       NA       NA      NA        NA         NA     46.66     47.42    46.54      46.76    27913852
# 1252 2015-01-05     2.67     2.70    2.64      2.66    8878176     46.37     46.73    46.25      46.32    39673865
# 1253 2015-01-06     2.65     2.66    2.55      2.63   13916645     46.38     46.75    45.54      45.65    36447854

按行“追加”

getSymbols将列表中唯一的列命名为轻微的挫败感。另外,由于我们将丢弃列名,因此我们应该在数据中保留符号名称。

dat3 <- lapply(dat, function(x) {
  ticker <- gsub("\\..*", "", colnames(x)[1])
  colnames(x) <- gsub(".*\\.", "", colnames(x))
  x <- as.data.frame(x)
  x$date <- rownames(x)
  x$symbol <- ticker
  rownames(x) <- NULL
  x
}) # can also be accomplished with mapply(..., dat, names(dat))
datlong <- Reduce(function(a, b) rbind(a, b, make.row.names = FALSE), dat3)

head(datlong)
#   Open High  Low Close   Volume       date symbol
# 1 8.71 9.20 8.55  9.15 32741845 2010-01-13    AMD
# 2 9.18 9.26 8.92  9.00 22658744 2010-01-14    AMD
# 3 9.13 9.19 8.80  8.84 34344763 2010-01-15    AMD
# 4 8.84 9.21 8.84  9.01 24875646 2010-01-19    AMD
# 5 8.98 9.00 8.76  8.87 22813520 2010-01-20    AMD
# 6 9.01 9.10 8.77  8.99 37888647 2010-01-21    AMD
nrow(datlong)
# [1] 3693