按组聚合并获取不同data.frame列的非NA值的计数,平均值和sd

时间:2018-01-18 14:53:18

标签: r aggregate missing-data

我在通过下面的函数按组计算非缺失值时遇到一些困难(这也给出了sd和平均值):

test <- do.call(data.frame, aggregate(. ~ treatment, have, function(x) c(n = sum(!is.na(x)), mean = mean(x), sd = sd(x))))

它最终为我提供了数据框中所有列的非缺失数,而不仅仅是一列。

我一直在寻找一些建议,发现thisthisthis很有帮助,但我无法弄清楚为什么聚合函数(x)会合并一些列的总和(!is.na(x),但不是平均值或sd。

编辑:添加表格

This is the data I have

This is the data I get from my code

This is the table I want

您会在'have'数据框中注意到,按处理组计算列var1中的非上升行会产生以下结果:

veh - 9 gr.4 - 8 gr.3 - 10 gr.2 - 5

但是当使用总和时(!is.na(x)我得到以下

veh - 6 gr.4 - 5 gr.3 - 10 gr.2 - 5

我相信这是因为函数使用var1和var2来总结非缺失的数量。我不知道如何纠正这个问题。

最佳,

杰克

1 个答案:

答案 0 :(得分:1)

这是data.table方法:

数据

您拥有的数据很难读入R - 请使用dput()等方式让其他人更轻松:

> dput(dt)
structure(list(someting = c("503", "553", "599", "647", "695", 
"728", "760", "793", "826", "859", "907", "955", "1003", "1036", 
"1084", "1131", "1179", "1226", "1274", "1322", "1355", "1402", 
"1450", "1497", "1545"), treatment = c("gr.2", "gr.2", "gr.2", 
"gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", 
"gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", 
"gr.3", "gr.3", "gr.3", "gr.3", "gr.4", "gr.4"), var1 = c(8, 
NA, 3, 3, NA, NA, NA, NA, NA, 8, 8, 8, NA, 8, 8, 8, 8, 8, 8, 
NA, 8, 8, 8, 8, NA), var2 = c(8L, 8L, 8L, 8L, NA, NA, NA, NA, 
NA, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L, 
NA)), .Names = c("someting", "treatment", "var1", "var2"), row.names = c(NA, 
-25L), class = c("data.table", "data.frame"))

<强> CODE

dt[, .(var1.n = sum(!is.na(var1)),
       var2.n = sum(!is.na(var1)), 
       var1.mean = mean(var1, na.rm = T), 
       var2.mean = mean(var2, na.rm = T)), 
   by = .(treatment)]

<强>输出

      treatment var1.n var2.n var1.mean var2.mean
1:      gr.2      5      5         6         8
2:      gr.3     10     10         8         8
3:      gr.4      1      1         8         8

出于某种原因,&#34; veh&#34;条目没有被读入。因此输出略有不同但原则应该是清楚的。