string get_MyProperty() {
return get_MyProperty();
}
void set_MyProperty(string value) {
set_MyProperty(value);
}
我无法在R中运行以下代码-引发此错误:
“ fileIWantToAnalyze”是一个数据表[。data.table(fileIWantToAnalyze,,..(mean1 = mean(get(attribute)),,)中的错误:“ by”或“ keyby”列表中的项的长度为(943026,1)。每个项必须为长度为943026;长度与x中的行数相同(如果提供了i,则在设置子集之后)。
> colnames(fileIWantToAnalyze)
[1] "variable_1a" "variable_5b"
[3] "variable_1b" "variable_6a"
[5] "variable_2a" "variable_6b"
[7] "variable_2b" "variable_7a"
[9] "variable_3a" "variable_7b"
[11] "variable_3b" "variable_8a"
[13] "variable_4a" "variable_8b"
[15] "variable_4b" "variable_9a"
[17] "variable_5a" "variable_9b"
[19] "GroupingColumn1"
这也不行
for(attribute in colnames(fileIWantToAnalyze)[c(1,3,5,7,9,11,13,15,17)]){
fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
by = .(GroupingColumn1,sub("a", "b", attribute))]
}
以下代码为我寻找了答案-但我想使用循环为许多变量生成输出
for (attribute in colnames(fileIWantToAnalyze)[c(1,3,5,7,9,11,13,15,17)]){
fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
by = .(GroupingColumn1,attribute)]
}
我相信问题是在分组时我如何在“ by”命令中调用“属性”
答案 0 :(得分:2)
您的问题来自data.table
函数如何解释变量的事实,尽管这实际上可能是意外的错误。
请注意以下虚拟示例进行说明:
dt <- data.table(A = 1:3, b = 3:5, c = 7:5)
#Works:
for(i in names(dt))
dt[,lapply(.SD, sum), by = i]
#doesnt work
for(i in names(dt))
dt[,lapply(.SD, sum), by = .(i)]
#works
for(i in names(dt))
dt[,lapply(.SD, sum), by = c(i)]
基本上,data.table
似乎不检查.(...)
的每个元素是否是表的命名空间中包含的单个字符向量。
因此,一个简单的解决方法是仅在by
参数中使用字符向量。下面是您的代码的修订版。
for(attribute in colnames(fileIWantToAnalyze)[seq(1, 17, by = 2]){
fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
#Note that "by" is now in a character vector.
by = c("GroupingColumn1", sub("a", "b", attribute))]
}
答案 1 :(得分:1)
请考虑将宽数据重整为长格式,这通常是大多数分析方法(汇总,绘图,建模)的首选方法。使用这种方法,可以避免复杂的循环。另外,data.table具有reshaping methods,包括melt
和dcast
。
melt_dt <- melt(fileIWantToAnalyze,
id.vars = c("GroupingColumn1"),
measure.vars = list(paste0("variable_", 1:9, "a"),
paste0("variable_", 1:9, "b"))
value.name = c("value_a", "value_b")
)
agg_dt <- melt_dt[, .(mean_value=(value_a), count=.N),
by=list(GroupingColumn1, value_b)][order(GroupingColumn1, value_b)]