我的代码中存在一个非常烦人的问题:
library(data.table)
a<-(letters=c(1:20))
b<-rnorm(1:20)
c<-rnorm(1:20)
d<-rnorm(1:20)
final<-data.frame(a,b,c,d)
e<-data.table(final)
g<-e[, lapply(.SD, sum), by =c("a"), .SDcols = 2:4] #calculates a summary of columns for every "by" statement in my large dataframe
h<-g[,2:4]
向量h应包括g的2-4列,但它包含一个值为2:4的值。但是,在我的脚本中有一些行进一步用df [,columns]选择某些列。如何解决这个问题的任何想法都将非常感激。
答案 0 :(得分:3)
Data Table FAQ中的第一个问题描述了这个问题:(关于为什么DT[,5]
返回5
)
Because, by default, unlike a data.frame, the 2nd argument is an
expression which is evaluated within the scope of DT. 5 evaluates to 5.
继续提供解决方法:
Having said this, there are some circumstances where referring to a column by
number is ok, such as a sequence of columns. In these situations just do:
DT[,5:10,with=FALSE]
或
DT[,c(1,4,10),with=FALSE]