将矢量值与R

时间:2017-05-25 15:52:11

标签: r sapply

我试图将月份的向量与R中的相应季度相匹配。不幸的是,我继承的代码包含列表中的四分之一,其中适当的月份作为每个列表元素的向量(这至少应该是适应性的如果需要你可以做季度,三个月或者学期)。目前,我正在使用sapply遍历向量并将相应的季度与每个月匹配,如下所示:

month.vec <- sample(1:12, 100, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)

month.to.quarter <- function(months, quarters) {
    sapply(months, FUN=function(x) {
        as.numeric(substr(names(which(x == unlist(quarters))),0,1))
    })
}
month.to.quarter(month.vec, quarters.list)

这适用于大约length(month.vec) < 1e5左右的向量,但在此之后有点耗时(参见下面的代码)。有没有人有一个优雅的解决方案,这种匹配在比这更长的向量?

显示处理时间如何随矢量长度增加的脚本。注意:这需要几秒钟(<10)

times <- NULL
for (i in c(10 %o% 10^(2:5))) {
    month.vec <- sample(1:12, i, replace=T)
    quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
    t <- system.time(a <- month.to.quarter(month.vec, quarters.list))[3]
    time <- data.frame(n = i, time = t)
    times <- rbind(times, time)
}
plot(time ~ n, times) 

3 个答案:

答案 0 :(得分:2)

我想知道转换季度列表是否会更快,以便可以使用月份作为索引来查找季度。像下面这样......

quarters <- as.numeric(substr(names(sort(unlist(quarters.list))),1,1))

这只需要做一次,然后就可以了

quarters.vec <- quarters[month.vec]

它快了大约2000倍......

microbenchmark::microbenchmark(quarters[month.vec],month.to.quarter(month.vec, quarters.list))
Unit: microseconds
                                       expr        min         lq        mean     median          uq        max neval
                        quarters[month.vec]    199.836    202.629    235.3968    227.763    233.9695    554.823   100
 month.to.quarter(month.vec, quarters.list) 439466.006 456649.059 495957.5722 469543.098 499346.5020 935046.664   100

答案 1 :(得分:1)

试试这个:

(month.vec - 1) %/% 3 + 1

答案 2 :(得分:0)

这是我提出的第一种方法。我想我是在哈德利的书中看到过的。它使用矢量元素的名称。

month.vec <- sample(1:12, 10000, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
# your method
month.to.quarter <- function(months, quarters) {
  sapply(months, FUN=function(x) {
    as.numeric(substr(names(which(x == unlist(quarters))),0,1))
  })
}
out1 <-month.to.quarter(month.vec, quarters.list)

# my method
vec <- rep(1:4, each = 3)
names(vec) <- 1:12
out2 <- vec[month.vec]
names(out2) <- NULL
all.equal(out1, out2) # this will return TRUE

基准测试真的与众不同。

month.vec <- sample(1:12, 10000, replace=T)
microbenchmark::microbenchmark(vec[month.vec],
month.to.quarter(month.vec, quarters.list))

## Unit: microseconds
##                                       expr       min        lq       mean    median        uq        max neval
##                             vec[month.vec]   108.503   112.433   119.3982   116.916   119.983    183.467   100
## month.to.quarter(month.vec, quarters.list) 78859.160 84036.995 87956.6532 86960.269 89975.668 140797.487   100

新方法的速度提高了约800倍。

如果你想让它成为一个功能,它就像这样,但仍然很快

month.to.quarter2 <- function(months) {
  vec <- rep(1:4, each = 3)
  names(vec) <- 1:12
  out <- vec[months]
  names(out) <- NULL
  return(out)
}

microbenchmark::microbenchmark(vec[month.vec],
                               month.to.quarter(month.vec, quarters.list),
                               month.to.quarter2(month.vec))

## Unit: microseconds
##                                        expr       min         lq       mean    median        uq        max neval
##                              vec[month.vec]   109.222   111.6345   121.3035   115.604   117.916    706.034   100
##  month.to.quarter(month.vec, quarters.list) 77292.742 83032.7425 85770.6963 84690.500 87243.327 138531.309   100
##                month.to.quarter2(month.vec)   117.264   120.3555   127.6535   127.021   133.474    153.556   100