假设 data.table 中有许多模拟(和其他变量):
data <- setDT(data.frame(sim1=c(1,1,1), sim2= c(2,2,2), sim3=c(3,3,3),
sim4=c(4,4,4), sim5=c(5,5,5), index=c(2,2,2)))
sim1 sim2 sim3 sim4 sim5 index
1: 1 2 3 4 5 2
2: 1 2 3 4 5 2
3: 1 2 3 4 5 2
我想计算高于 index 列的模拟的平均值:
data[, higher.than.index.ave := rowMeans(.SD[.SD > index]),
.SDcols = names(data[, grepl(paste(paste("sim", 1:5, sep=""),
collapse = "|") , names(data)), with=FALSE])]
我也试过其他解决方案,没有运气。有什么建议我可以执行这样的任务吗?
答案 0 :(得分:3)
data <- data.table(sim1=c(1,1,1), sim2= c(2,2,2), sim3=c(3,3,3),
sim4=c(4,4,4), sim5=c(5,5,5), index=c(2,2,2))
data[, means :=
rowMeans(data[, lapply(.SD, function(x) ifelse(x < index, NA, x))
][, -'index'],
na.rm = T)]
或者,使用.SDcols
仅选择sim
列:
data[, means :=
rowMeans(data[, lapply(.SD, function(x) ifelse(x < index, NA, x))
, .SDcols = intersect(paste0('sim', 1:5), names(data))],
na.rm = T)]
输出:
data
sim1 sim2 sim3 sim4 sim5 index means
1: 1 2 3 4 5 2 3.5
2: 1 2 3 4 5 2 3.5
3: 1 2 3 4 5 2 3.5
答案 1 :(得分:2)
pushState()