Question

在data.table中，基于列数的数字向量对表进行子集化的一种方法是使用with=FALSE。

我试图根据列号的数字向量遍历data.table，寻找符合某个标准的行，如下所示：

require(data.table)

ab=data.table(id=c("geneA", "geneB", "geneC", "geneA", "geneA", "geneB", "", "NA"),
              co1=c(1,2,3,0,7), co2=c(0,0,4,5,6), nontarget=c(9,0,7,6,5), 
              co3=c(0,1,2,3,4))
target_col_nums=grep('co', colnames(ab))

##Data.table doesn't treat colnames(ab)[i] as one of the
##  column name variables, and with=F only seems to work for j in dt[i,j,by]
for (i in target_col_nums){
    print(ab[colnames(ab)[i]>3])
}

##This produces the desired output
ab[co1>3]
ab[co2>3]
ab[co3>3]

在我的情况下，我的实际桌子很大，所以我不能自己使用这些名字。

我希望这对社区来说是一个有用的问题。

Answer 1

for (col in grep('co', names(ab), value = T))
  print(ab[get(col) > 3])
#      id co1 co2 nontarget co3
#1: geneA   7   6         5   4
#      id co1 co2 nontarget co3
#1: geneC   3   4         7   2
#2: geneA   0   5         6   3
#3: geneA   7   6         5   4
#4:    NA   3   4         7   2
#      id co1 co2 nontarget co3
#1: geneA   7   6         5   4

Answer 2

您可以将列（eval）评估为表达式

for (i in target_col_nums){
    expr <- paste0(colnames(ab)[i], ">3")
    print(ab[eval(parse(text = expr)), ])
}

#      id co1 co2 nontarget co3
#1: geneA   7   6         5   4
#      id co1 co2 nontarget co3
#1: geneC   3   4         7   2
#2: geneA   0   5         6   3
#3: geneA   7   6         5   4
#4:    NA   3   4         7   2
#      id co1 co2 nontarget co3
#1: geneA   7   6         5   4

或者您可以尝试问题passing variables as data.table column names

中的任何建议

Answer 3

您的方法可以稍微调整一下，并且仍然可以使用列号（虽然在这种情况下没有那么有害，因为您以编程方式得到数字，但通常是不好的做法）：

target_cols = names(ab)[grepl("co", names(ab))]

sapply(target_cols, function(jj) print(ab[get(jj) > 3]))

invisible如果NULL输入分散注意力，请将其包裹起来/否则会让您感到困扰。

Answer 4

我们可以指定＆＃39; i＆＃39;在.SDcols中使用.SD上的条件来获取逻辑向量，该向量可用于对行进行子集化。

for(i in target_col_nums){
 print(ab[ab[, .SD[[1L]] >3, .SDcols = i]])
}
#         id co1 co2 nontarget co3
#1: geneA   7   6         5   4
#      id co1 co2 nontarget co3
#1: geneC   3   4         7   2
#2: geneA   0   5         6   3
#3: geneA   7   6         5   4
#4:    NA   3   4         7   2
#      id co1 co2 nontarget co3
#1: geneA   7   6         5   4

如何根据某个列号中的值对data.table进行子集化

4 个答案: