如何计算分组数据帧的平均值?

时间:2016-07-08 23:27:48

标签: r

我正在尝试根据participant_number拆分数据框,然后计算特定列HappinessJoy(不包括列Lolz)的宏均值。为什么取列的平均值意味着:

Warning messages:
1: In mean.default(function (x, na.rm = FALSE, dims = 1L)  :
  argument is not numeric or logical: returning NA
2: In mean.default(function (x, na.rm = FALSE, dims = 1L)  :
  argument is not numeric or logical: returning NA

我的代码:

library(dplyr)
df<-data.frame(participant_number=c(1,1,1,2,2),Happiness=c(3,4,2,1,3),Joy=c(1,2,3,5,4),Lolz=c(3,3,3,3,3))

df%>%group_by(participant_number)%>%
select(Happiness,Joy)%>%
mutate(emoMean=mean(colMeans))

> df
  participant_number Happiness Joy Lolz
1                  1         3   1    3
2                  1         4   2    3
3                  1         2   3    3
4                  2         1   5    3
5                  2         3   4    3

目标

emoMean
participant_number ... emoMean
1                      2.5 (3+1+4+2+2+3)/6 #Note that this value does not include participant_number
1                      2.5
1                      2.5
2                      6.5
2                      6.5

注意:

我试图将this作为潜在的解决方案,但完全丢失了

3 个答案:

答案 0 :(得分:2)

对于您的具体情况,您可以将两列相加,取均值然后除以2,因为两列总是具有相同的数:

df %>% group_by(participant_number) %>% mutate(emoMean = mean(Happiness + Joy)/2)

Source: local data frame [5 x 5]
Groups: participant_number [2]

  participant_number Happiness   Joy  Lolz emoMean
               <dbl>     <dbl> <dbl> <dbl>   <dbl>
1                  1         3     1     3    2.50
2                  1         4     2     3    2.50
3                  1         2     3     3    2.50
4                  2         1     5     3    3.25
5                  2         3     4     3    3.25

注意:同时,根据您对第一组平均值的定义,我认为对于第二组,它应该是3.25而不是6.5。

答案 1 :(得分:1)

plyr的替代方案:

df<data.frame(participant_number=c(1,1,1,2,2),Happiness=c(3,4,2,1,3),Joy=c(1,2,3,5,4),Lolz=c(3,3,3,3,3))

df$mean <- ave(apply(df[,2:3],1,mean, na.rm=TRUE), df$participant_number )

答案 2 :(得分:1)

我们可以使用data.table

 library(data.table)
 setDT(df)[, emoMean := mean(Happiness + Joy)/2 , by = participant_number]

如果有sum列有多列,则有一个选项是Reduce

 nm1 <- names(df)[2:3]
 setDT(df)[, emoMean := Reduce(`+`, .SD)/length(nm1), 
                   by = participant_number, .SDcols = nm1]