假设我有两个data.tables:
indexDT <- data.table(id = rep(c(1,2,3),c(3,2,1)), V1 = c(1,3,5,2,4,4) , V3= c(3,4,5, 4, 5,5))
DT <- data.table(id = rep(1:3,(rep(5,3))), data.table(sapply(1:3, function(i){rnorm(5*3)})))
setkey(indexDT,"id")
setkey(DT,"id")
即
#> indexDT
# id V1 V3
#1: 1 1 3
#2: 1 3 4
#3: 1 5 5
#4: 2 2 4
#5: 2 4 5
#6: 3 4 5
#> DT
# id V1 V2 V3
#1: 1 0.30093680 2.0481465 0.7207622
#2: 1 -0.79176664 -1.0024393 -1.5915616
#3: 1 0.57746018 -1.1214380 -0.6158101
#4: 1 -1.61781064 0.3569482 -1.2155334
#5: 1 -0.14585645 -2.0758002 -0.6914313
#6: 2 1.16340667 0.7991301 0.1155552
#7: 2 0.08072223 -1.2330383 1.3123562
#8: 2 -1.07706321 0.1705363 -0.6569734
#9: 2 -0.98598985 -0.5853677 -1.2507563
#10: 2 -0.16048051 -1.9341206 0.1300098
#11: 3 -0.39287015 0.2486458 -0.2215037
#12: 3 0.84511312 0.2084681 1.3388653
#13: 3 -0.09892791 -2.3361669 1.6006061
#14: 3 -0.01676263 -1.7047148 -0.2918755
#15: 3 -0.43500633 -0.8481987 0.3053506
indexDT中的值充当每个列名称的每个id的行索引。现在我想执行以下操作:对于indexDT中的每一列(此处:V1和V3)和每个id(此处为1,2和3),根据相同的列和id选择DT中的值。 一个解决方案如下,但这个不是很优雅,难以阅读,我希望有一个更快的解决方案。 indexDT和DT都非常大(DT为nrow = 500k * 26且nrow = +/- 10k)
Ind_p <- grep("V",names(indexDT),value=T)
selectionDT <- DT[, lapply(Ind_p,function(p,k){.SD[indexDT[id == k, ][[p]], ][[p]]},id), by = id, .SDcol = Ind_p]
这给出了
#> selectionDT
# id V1 V2
#1: 1 0.30093680 -0.6158101
#2: 1 0.57746018 -1.2155334
#3: 1 -0.14585645 -0.6914313
#4: 2 0.08072223 -1.2507563
#5: 2 -0.98598985 0.1300098
#6: 3 -0.01676263 0.3053506
任何更好的解决方案将非常感谢!!! 谢谢!
答案 0 :(得分:0)
也许您可以使用索引矩阵替换矩阵来执行此操作:
DT[, names(indexDT), with = F][indexDT[, .(M = list(as.matrix(.SD))), keyby = id],
as.data.table(matrix(as.matrix(.SD)[cbind(c(t(M[[1]])), 1:ncol(M[[1]]))],
ncol = ncol(M[[1]]), byrow = T)), by = .EACHI]
# id V1 V2
#1: 1 -0.08786187 -1.1277373
#2: 1 -0.62336535 0.5501641
#3: 1 1.09400253 -0.8152316
#4: 2 -1.01158421 2.0713417
#5: 2 -0.08669810 -0.3845776
#6: 3 -0.10041684 -0.2430609
cbind
内部构造索引矩阵,其余的只是将数据转换为正确的类型。