R:使用for循环遍历data.table中的变量名(并按变量分组)

时间:2019-08-24 01:17:42

标签: r loops for-loop data.table

string get_MyProperty() {
    return get_MyProperty();
}

void set_MyProperty(string value) {
    set_MyProperty(value);
}

我无法在R中运行以下代码-引发此错误:

  

[。data.table(fileIWantToAnalyze,,..(mean1 = mean(get(attribute)),,)中的错误:“ by”或“ keyby”列表中的项的长度为(943026,1)。每个项必须为长度为943026;长度与x中的行数相同(如果提供了i,则在设置子集之后)。

“ fileIWantToAnalyze”是一个数据表
> colnames(fileIWantToAnalyze) 

[1] "variable_1a"     "variable_5b"                                  
[3] "variable_1b"     "variable_6a"                           
[5] "variable_2a"     "variable_6b"                           
[7] "variable_2b"     "variable_7a"                           
[9] "variable_3a"     "variable_7b"                           
[11] "variable_3b"    "variable_8a"        
[13] "variable_4a"    "variable_8b"       
[15] "variable_4b"    "variable_9a"            
[17] "variable_5a"    "variable_9b"            
[19] "GroupingColumn1"

这也不行

for(attribute in colnames(fileIWantToAnalyze)[c(1,3,5,7,9,11,13,15,17)]){
  fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
                      by = .(GroupingColumn1,sub("a", "b", attribute))]
}

以下代码为我寻找了答案-但我想使用循环为许多变量生成输出

for (attribute in colnames(fileIWantToAnalyze)[c(1,3,5,7,9,11,13,15,17)]){
  fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
    by = .(GroupingColumn1,attribute)]
}

我相信问题是在分组时我如何在“ by”命令中调用“属性”

2 个答案:

答案 0 :(得分:2)

您的问题来自data.table函数如何解释变量的事实,尽管这实际上可能是意外的错误。

请注意以下虚拟示例进行说明:

dt <- data.table(A = 1:3, b = 3:5, c = 7:5)
#Works:
for(i in names(dt))
  dt[,lapply(.SD, sum), by = i]
#doesnt work
for(i in names(dt))
  dt[,lapply(.SD, sum), by = .(i)]
#works
for(i in names(dt))
  dt[,lapply(.SD, sum), by = c(i)]

基本上,data.table似乎不检查.(...)的每个元素是否是表的命名空间中包含的单个字符向量。

因此,一个简单的解决方法是仅在by参数中使用字符向量。下面是您的代码的修订版。

for(attribute in colnames(fileIWantToAnalyze)[seq(1, 17, by = 2]){
  fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
                      #Note that "by" is now in a character vector.  
                      by = c("GroupingColumn1", sub("a", "b", attribute))]
}

答案 1 :(得分:1)

请考虑将宽数据重整为长格式,这通常是大多数分析方法(汇总,绘图,建模)的首选方法。使用这种方法,可以避免复杂的循环。另外,data.table具有reshaping methods,包括meltdcast

melt_dt <- melt(fileIWantToAnalyze, 
                id.vars = c("GroupingColumn1"), 
                measure.vars = list(paste0("variable_", 1:9, "a"),
                                    paste0("variable_", 1:9, "b"))
                value.name = c("value_a", "value_b")
               )

agg_dt <- melt_dt[, .(mean_value=(value_a), count=.N), 
                  by=list(GroupingColumn1, value_b)][order(GroupingColumn1, value_b)]