使用索引的data.table索引data.table

时间:2015-12-09 14:59:05

标签: r indexing data.table

假设我有两个data.tables:

indexDT <- data.table(id = rep(c(1,2,3),c(3,2,1)), V1 = c(1,3,5,2,4,4) , V3= c(3,4,5, 4, 5,5))
DT <- data.table(id = rep(1:3,(rep(5,3))), data.table(sapply(1:3, function(i){rnorm(5*3)})))

setkey(indexDT,"id")
setkey(DT,"id")

#> indexDT
#   id V1 V3
#1:  1  1  3
#2:  1  3  4
#3:  1  5  5
#4:  2  2  4
#5:  2  4  5
#6:  3  4  5

#> DT
#     id          V1         V2         V3
#1:  1  0.30093680  2.0481465  0.7207622
#2:  1 -0.79176664 -1.0024393 -1.5915616
#3:  1  0.57746018 -1.1214380 -0.6158101
#4:  1 -1.61781064  0.3569482 -1.2155334
#5:  1 -0.14585645 -2.0758002 -0.6914313
#6:  2  1.16340667  0.7991301  0.1155552
#7:  2  0.08072223 -1.2330383  1.3123562
#8:  2 -1.07706321  0.1705363 -0.6569734
#9:  2 -0.98598985 -0.5853677 -1.2507563
#10:  2 -0.16048051 -1.9341206  0.1300098
#11:  3 -0.39287015  0.2486458 -0.2215037
#12:  3  0.84511312  0.2084681  1.3388653
#13:  3 -0.09892791 -2.3361669  1.6006061
#14:  3 -0.01676263 -1.7047148 -0.2918755
#15:  3 -0.43500633 -0.8481987  0.3053506

indexDT中的值充当每个列名称的每个id的行索引。现在我想执行以下操作:对于indexDT中的每一列(此处:V1和V3)和每个id(此处为1,2和3),根据相同的列和id选择DT中的值。 一个解决方案如下,但这个不是很优雅,难以阅读,我希望有一个更快的解决方案。 indexDT和DT都非常大(DT为nrow = 500k * 26且nrow = +/- 10k)

Ind_p <- grep("V",names(indexDT),value=T)
selectionDT <- DT[, lapply(Ind_p,function(p,k){.SD[indexDT[id == k, ][[p]], ][[p]]},id), by = id, .SDcol = Ind_p]

这给出了

#> selectionDT 
#   id          V1         V2
#1:  1  0.30093680 -0.6158101
#2:  1  0.57746018 -1.2155334
#3:  1 -0.14585645 -0.6914313
#4:  2  0.08072223 -1.2507563
#5:  2 -0.98598985  0.1300098
#6:  3 -0.01676263  0.3053506

任何更好的解决方案将非常感谢!!! 谢谢!

1 个答案:

答案 0 :(得分:0)

也许您可以使用索引矩阵替换矩阵来执行此操作:

DT[, names(indexDT), with = F][indexDT[, .(M = list(as.matrix(.SD))), keyby = id],
     as.data.table(matrix(as.matrix(.SD)[cbind(c(t(M[[1]])), 1:ncol(M[[1]]))],
                          ncol = ncol(M[[1]]), byrow = T)), by = .EACHI]
#   id          V1         V2
#1:  1 -0.08786187 -1.1277373
#2:  1 -0.62336535  0.5501641
#3:  1  1.09400253 -0.8152316
#4:  2 -1.01158421  2.0713417
#5:  2 -0.08669810 -0.3845776
#6:  3 -0.10041684 -0.2430609

cbind内部构造索引矩阵,其余的只是将数据转换为正确的类型。