我试图将月份的向量与R中的相应季度相匹配。不幸的是,我继承的代码包含列表中的四分之一,其中适当的月份作为每个列表元素的向量(这至少应该是适应性的如果需要你可以做季度,三个月或者学期)。目前,我正在使用sapply
遍历向量并将相应的季度与每个月匹配,如下所示:
month.vec <- sample(1:12, 100, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
month.to.quarter <- function(months, quarters) {
sapply(months, FUN=function(x) {
as.numeric(substr(names(which(x == unlist(quarters))),0,1))
})
}
month.to.quarter(month.vec, quarters.list)
这适用于大约length(month.vec) < 1e5
左右的向量,但在此之后有点耗时(参见下面的代码)。有没有人有一个优雅的解决方案,这种匹配在比这更长的向量?
显示处理时间如何随矢量长度增加的脚本。注意:这需要几秒钟(<10)
times <- NULL
for (i in c(10 %o% 10^(2:5))) {
month.vec <- sample(1:12, i, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
t <- system.time(a <- month.to.quarter(month.vec, quarters.list))[3]
time <- data.frame(n = i, time = t)
times <- rbind(times, time)
}
plot(time ~ n, times)
答案 0 :(得分:2)
我想知道转换季度列表是否会更快,以便可以使用月份作为索引来查找季度。像下面这样......
quarters <- as.numeric(substr(names(sort(unlist(quarters.list))),1,1))
这只需要做一次,然后就可以了
quarters.vec <- quarters[month.vec]
它快了大约2000倍......
microbenchmark::microbenchmark(quarters[month.vec],month.to.quarter(month.vec, quarters.list))
Unit: microseconds
expr min lq mean median uq max neval
quarters[month.vec] 199.836 202.629 235.3968 227.763 233.9695 554.823 100
month.to.quarter(month.vec, quarters.list) 439466.006 456649.059 495957.5722 469543.098 499346.5020 935046.664 100
答案 1 :(得分:1)
试试这个:
(month.vec - 1) %/% 3 + 1
答案 2 :(得分:0)
这是我提出的第一种方法。我想我是在哈德利的书中看到过的。它使用矢量元素的名称。
month.vec <- sample(1:12, 10000, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
# your method
month.to.quarter <- function(months, quarters) {
sapply(months, FUN=function(x) {
as.numeric(substr(names(which(x == unlist(quarters))),0,1))
})
}
out1 <-month.to.quarter(month.vec, quarters.list)
# my method
vec <- rep(1:4, each = 3)
names(vec) <- 1:12
out2 <- vec[month.vec]
names(out2) <- NULL
all.equal(out1, out2) # this will return TRUE
基准测试真的与众不同。
month.vec <- sample(1:12, 10000, replace=T)
microbenchmark::microbenchmark(vec[month.vec],
month.to.quarter(month.vec, quarters.list))
## Unit: microseconds
## expr min lq mean median uq max neval
## vec[month.vec] 108.503 112.433 119.3982 116.916 119.983 183.467 100
## month.to.quarter(month.vec, quarters.list) 78859.160 84036.995 87956.6532 86960.269 89975.668 140797.487 100
新方法的速度提高了约800倍。
如果你想让它成为一个功能,它就像这样,但仍然很快
month.to.quarter2 <- function(months) {
vec <- rep(1:4, each = 3)
names(vec) <- 1:12
out <- vec[months]
names(out) <- NULL
return(out)
}
microbenchmark::microbenchmark(vec[month.vec],
month.to.quarter(month.vec, quarters.list),
month.to.quarter2(month.vec))
## Unit: microseconds
## expr min lq mean median uq max neval
## vec[month.vec] 109.222 111.6345 121.3035 115.604 117.916 706.034 100
## month.to.quarter(month.vec, quarters.list) 77292.742 83032.7425 85770.6963 84690.500 87243.327 138531.309 100
## month.to.quarter2(month.vec) 117.264 120.3555 127.6535 127.021 133.474 153.556 100