R-过滤行并求和

时间:2018-12-11 10:21:39

标签: r data.table

我有一个data.table多列。我正在尝试对某一特定列的子集求和。

sum(basetable_orig[get(var) %in% values[s], .(get(target))])

但是,这会导致错误:

  

FUN(X [[i]],...)中的错误:     仅在具有所有数字变量的数据框上定义

因此,我调查了一下,这是到目前为止我发现的内容:

var <- "colName"
target <- "target"
s <- 1
values <- c("1","2")

感兴趣的列是数字类型:

str(basetable_orig[,c("colName")])
#gives following:
Classes ‘data.table’ and 'data.frame':  12345 obs. of  1 variable:
$ colName: num  1 1 1 1 2 1 1 1 1 1 ...

没什么,我看到data.table自动将数字变量转换为因数:

tst <- basetable_orig[get(var) %in% values[s], .(get(target))]
str(tst)
#gives following:
Classes ‘data.table’ and 'data.frame':  12345 obs. of  1 variable:
 $ V1: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

因此,它显然不能求和。因此,任何人都可以为什么向我解释,并如何修复它?

编辑

下面是可复制的示例。

var <- "colName"
target <- "colTarget"
s <- 1

example_data <- data.table(colName = c(1,2,1,2,1), colTarget = c("0","0","1","1","1"))
example_data <- example_data[, colTarget:=as.factor(colTarget)]
str(example_data)
#Classes ‘data.table’ and 'data.frame': 5 obs. of  2 variables:
#  $ colName  : num  1 2 1 2 1
#$ colTarget: Factor w/ 2 levels "0","1": 1 1 2 2 2

values<-names(table(example_data[,get(var)],exclude = NULL))
print(values)
#[1] "1" "2"

tst <- example_data[get(var) %in% values[s], .(get(target))]
str(tst)
#Classes ‘data.table’ and 'data.frame': 1 obs. of  1 variable:
#$ V1: Factor w/ 2 levels "0","1": 1 2 2

sum(example_data[get(var) %in% values[s], .(get(target))])
#Gives an error:
#Error in FUN(X[[i]], ...) : 
#  only defined on a data frame with all numeric variables

预期的输出如下。这是我拥有的表,我想为colName = 1计算colTarget中的“ 1”数。因此,结果应为2(colTarget列的1,3,5行的总和)

   colName colTarget
1:       1         0
2:       2         0
3:       1         1
4:       2         1
5:       1         1

1 个答案:

答案 0 :(得分:1)

您在这里:

library(data.table)
example_data[colName == 1 & colTarget == 1, sum(colName)]