Question

我有一个如下数据框：

df <- data.frame(category = c('A','B','C','A','C','B'),
                 value    = c(5.4, 5, 3.4,7.5,6.7,3.5),
                 status   = c('HC','D','D','HC','HC','D'))

我想计算所有类别和状态组合的值的均值。例如，('A','HC')和('B','HC')的平均值。如果只有一个值，它应该只输出奇异值。

如何做到这一点？

Answer 1

您可以使用dplyr或data.table

进行汇总

require(dplyr)
df %>% group_by(category,status) %>%
       summarize(mean_value=mean(value))

  category status mean_value
    <fctr> <fctr>      <dbl>
1        A     HC       6.45
2        B      D       4.25
3        C      D       3.40
4        C     HC       6.70

另请参阅@PoGibas，请重新发布data.table个答案。

Answer 2

这是另一种解决方案

# defining the set of all combinations of category/statuts in df
all.combinations <- unique(paste(df$category, df$status, sep = ";"))

# creating a function that will return the mean of one given combination
fun1 <- function(x){
  indices <- which(paste(df$category, df$status, sep = ";") == all.combinations[x])
  sigma <- mean(df$value[indices])
  return(sigma)
}

# finally applying our function to all possible combinations
sapply(1:length(all.combinations), fun1)
[1] 6.45 4.25 3.40 6.70

根据其他列中的参数计算多个值的平均值

2 个答案: