Question

我想在“col1”，“col2”上分组，得到col3的平均值

newData
id    col1   col2    col3   col4   col5
1     200    2000    150     151    NA
2     200    2000    250     160   "tim"
3     201    2000    300     189    NA
4     201    2000    400     182    NA

我希望我的输出

id    col1   col2    col3   col4    col5
1     200    2000    200     151     NA    
2     201    2000    350     189     NA

aggdata <-aggregate(newData, 
                by=list(newData$col1,newData$col2), 
                FUN=mean, na.rm=TRUE)

这给了我所有我不想要的变量的平均值。

Answer 1

也许您正在寻找两个merge的{{1}}：

aggregate

第一个out <- merge(aggregate(col3 ~ col1 + col2, mydf, mean, na.rm = TRUE), aggregate(cbind(col4, col5) ~ col1 + col2, mydf, `[`, 1, na.action = na.pass), by = c("col1", "col2")) out <- cbind(id = 1:nrow(out), out) out # id col1 col2 col3 col4 col5 # 1 1 200 2000 200 151 <NA> # 2 2 201 2000 350 189 <NA>获取“col3”的aggregate。第二个mean分别提取“col4”和“col5”的第一个元素。

我手动创建了一个“id”列，因为在您的示例中，输出中的“id”列似乎不适合任何可辨别的模式。

Answer 2

这没有多大意义，但这里有另一种选择

> cbind(aggregate(col3~col1+col2, data=newData, FUN="mean"),
        newData[!duplicated(newData[, "col1"]), c("col4", "col5")])
  col1 col2 col3 col4 col5
1  200 2000  200  151 <NA>
3  201 2000  350  189 <NA>

按两个属性分组并计算R中另一个属性的平均值

2 个答案: