Question

我有兴趣过滤R数据表以选择相应的行元素到数字列的列表。

举个例子，假设我有：

DT<- data.table(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9))

给出，

现在，我有一个名为select的外部向量，它包含我想要选择的与行对应的列。

select <- c(2,3,1)

我希望它返回一个新的数据表，每个行的值对应于所选的列。

DTnew 
1: 4
2: 8
3: 3

如果我尝试DT[,.SD[select]]之类的东西，它会返回一个新数据表，其中整行与选择列表对应。

> DT[,.SD[select]]
   a b c
1: 2 5 8
2: 3 6 9
3: 1 4 7

我该如何完成这项任务？

编辑：我没有说清楚，但结果需要保留数据表行的原始顺序，因为它是基于时间序列的对象（我省略了ts索引以使问题更简单）。

更新2：一些发布的解决方案的时序结果（使用数据表方法在系统时间上看起来要快得多，不确定如何在用户上结束结果和经过的时间/开销，但我也希望与整个方式保持一致猜猜我应该问一下，当速度优先时，DT用户是否常常来回进行基于矩阵的计算。）

library(data.table)
library(microbenchmark)

set.seed(123)

DT <- data.table(matrix(rnorm(10e3*10e3),nrow=10e3,ncol=10e3))
select<-sample(10e3,replace=FALSE)

op <- microbenchmark(
sol1 <- function(DT,select) DT[, V1 := .SD[[select]], by = select]$V1,

sol2 <- function(DT,select) {
x <- as.matrix(DT)
x[cbind(1:nrow(x), select)]
},

times=1000L)

Warning message:
In microbenchmark(sol1 <- function(DT, select) DT[, `:=`(V1, .SD[[select]]),  :
  Could not measure a positive execution time for 1019 evaluations.


> identical(sol1(DT,select),sol2(DT,select))
[1] TRUE
> op
Unit: nanoseconds
                                                                                    expr min lq   mean median uq   max neval cld
              sol1 <- function(DT, select) DT[, `:=`(V1, .SD[[select]]), by = select]$V1   0  0 25.136      0  1  9837  1000   a
 sol2 <- function(DT, select) {     x <- as.matrix(DT)     x[cbind(1:nrow(x), select)] }   0  0 52.477      0  1 39345  1000   a

方法2：

> system.time(replicate(10,sol1(DT,select)))
   user  system elapsed 
  64.07    0.25   64.33 
> system.time(replicate(10,sol2(DT,select)))
   user  system elapsed 
   4.97    2.25    7.22

Answer 1

您可以使用矩阵索引使用矩阵执行此操作：

x <- as.matrix(DT)
x[cbind(1:nrow(x), select)]
## [1] 4 8 3

如果你开始使用数据框，你也可以使用矩阵对其进行索引：

x <- data.frame(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9)) # or as.data.frame(DT)
x[cbind(1:nrow(x), select)]
## [1] 4 8 3

Answer 2

还有更多选择：

jms:outbound-enpoint

这些基本上都做同样的事情：对于# extended example DT <- rbind(DT,DT) select <- c(select,rev(select)) expected <- c(4,8,3,1,8,6) # create a new column with by DT[, V1 := .SD[[select]], by = select]$V1 # or use ave ave( seq(nrow(DT)), select, FUN = function(ii) DT[[ select[ii][1] ]][ii] )中的每个值v，抓取相应的向量select;并将其子集到DT[[v]]。

如何根据外部列向量

2 个答案: