我在通过下面的函数按组计算非缺失值时遇到一些困难(这也给出了sd和平均值):
test <- do.call(data.frame, aggregate(. ~ treatment, have, function(x) c(n = sum(!is.na(x)), mean = mean(x), sd = sd(x))))
它最终为我提供了数据框中所有列的非缺失数,而不仅仅是一列。
我一直在寻找一些建议,发现this,this和this很有帮助,但我无法弄清楚为什么聚合函数(x)会合并一些列的总和(!is.na(x),但不是平均值或sd。
编辑:添加表格
This is the data I get from my code
您会在'have'数据框中注意到,按处理组计算列var1中的非上升行会产生以下结果:
veh - 9 gr.4 - 8 gr.3 - 10 gr.2 - 5
但是当使用总和时(!is.na(x)我得到以下
veh - 6 gr.4 - 5 gr.3 - 10 gr.2 - 5
我相信这是因为函数使用var1和var2来总结非缺失的数量。我不知道如何纠正这个问题。
最佳,
杰克答案 0 :(得分:1)
这是data.table
方法:
数据强>
您拥有的数据很难读入R - 请使用dput()
等方式让其他人更轻松:
> dput(dt)
structure(list(someting = c("503", "553", "599", "647", "695",
"728", "760", "793", "826", "859", "907", "955", "1003", "1036",
"1084", "1131", "1179", "1226", "1274", "1322", "1355", "1402",
"1450", "1497", "1545"), treatment = c("gr.2", "gr.2", "gr.2",
"gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2",
"gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3",
"gr.3", "gr.3", "gr.3", "gr.3", "gr.4", "gr.4"), var1 = c(8,
NA, 3, 3, NA, NA, NA, NA, NA, 8, 8, 8, NA, 8, 8, 8, 8, 8, 8,
NA, 8, 8, 8, 8, NA), var2 = c(8L, 8L, 8L, 8L, NA, NA, NA, NA,
NA, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L,
NA)), .Names = c("someting", "treatment", "var1", "var2"), row.names = c(NA,
-25L), class = c("data.table", "data.frame"))
<强> CODE 强>
dt[, .(var1.n = sum(!is.na(var1)),
var2.n = sum(!is.na(var1)),
var1.mean = mean(var1, na.rm = T),
var2.mean = mean(var2, na.rm = T)),
by = .(treatment)]
<强>输出强>
treatment var1.n var2.n var1.mean var2.mean
1: gr.2 5 5 6 8
2: gr.3 10 10 8 8
3: gr.4 1 1 8 8
出于某种原因,&#34; veh&#34;条目没有被读入。因此输出略有不同但原则应该是清楚的。