用向量化操作替换循环

时间:2016-06-01 12:39:11

标签: r

我正在使用这个code来制作烛台。但是,它包含一个非常低效的循环(循环10K观察需要38秒)。它还使用rbind函数,这意味着必须将日期转换为数字然后再转回,考虑到日期与时间的关系,这似乎不是直接的。

循环我试图用更有效的函数替换:

for(i in 1:nrow(prices)){
x <- prices[i, ]

# For high / low
mat <- rbind(c(x[1], x[3]), 
             c(x[1], x[4]),
             c(NA, NA))

plot.base <- rbind(plot.base, mat)
}

输出是一个向量,第一个观察是输入数据的第一个(日期)和第三个col,第二个观察是输入数据的第一个和第四个col,第三个观察是两个NA。这些NAs对于绘图很重要。

实现这一目标的最有效方法是什么?

最小可重复的例子:

library(quantmod)

  prices <- getSymbols("MSFT", auto.assign = F)

  # Convert to dataframe
  prices <- data.frame(time = index(prices),
                       open = as.numeric(prices[,1]),
                       high = as.numeric(prices[,2]),
                       low = as.numeric(prices[,3]),
                       close = as.numeric(prices[,4]),
                       volume = as.numeric(prices[,5]))

 # Create line segments for high and low prices
  plot.base <- data.frame()

    for(i in 1:nrow(prices)){
x <- prices[i, ]

# For high / low
mat <- rbind(c(x[1], x[3]), 
             c(x[1], x[4]),
             c(NA, NA))

plot.base <- rbind(plot.base, mat)
}

编辑:

dput(head(prices))
structure(list(time = structure(c(13516, 13517, 13518, 13521, 
13522, 13523), class = "Date"), open = c(29.91, 29.700001, 29.629999, 
29.65, 30, 29.799999), high = c(30.25, 29.969999, 29.75, 30.1, 
30.18, 29.889999), low = c(29.4, 29.440001, 29.450001, 29.530001, 
29.73, 29.43), close = c(29.860001, 29.809999, 29.639999, 29.93, 
29.959999, 29.66), volume = c(76935100, 45774500, 44607200, 50220200, 
44636600, 55017400)), .Names = c("time", "open", "high", "low", 
"close", "volume"), row.names = c(NA, 6L), class = "data.frame")

1 个答案:

答案 0 :(得分:4)

我会对在循环中增长对象的教程持谨慎态度。这是你在编程中可以做的最慢的操作之一。 (这就像购买一个货架,其中包含您的书籍所需的房间,然后在每次购买新书时更换货架。)

使用这样的子集:

res <- data.frame(date = rep(prices[, 1], each = 3),
                  y = c(t(prices[,c(3:4)])[c(1:2, NA),])) #transpose, subset, make to vector
res[c(FALSE, FALSE, TRUE), 1] <- NA
#         date     y
#1  2007-01-03 30.25
#2  2007-01-03 29.40
#3        <NA>  <NA>
#4  2007-01-04 29.97
#5  2007-01-04 29.44
#6        <NA>  <NA>
#7  2007-01-05 29.75
#8  2007-01-05 29.45
#9        <NA>  <NA>
#10 2007-01-08 30.10
#11 2007-01-08 29.53
#12       <NA>  <NA>
#13 2007-01-09 30.18
#14 2007-01-09 29.73
#15       <NA>  <NA>
#16 2007-01-10 29.89
#17 2007-01-10 29.43
#18       <NA>  <NA>