Question

假设我在R：

中有这个data.frame

ages <- data.frame(Indiv = numeric(),
    Age = numeric(),
    W = numeric())
ages[1,] <- c(1,10,2)
ages[2,] <- c(1,15,5)
ages[3,] <- c(2,5,1)
ages[4,] <- c(2,100,2)

ages

  Indiv Age W
1     1  10 2
2     1  15 5
3     2   5 1
4     2 100 2

如果我这样做：

meanAge <- aggregate(ages$Age,list(ages$Indiv),mean)

我得到每个Indiv（Group.1）的平均年龄（x）：

  Group.1    x
1       1 12.5
2       2 52.5

但我想计算年龄的加权算术平均值（权重为W）。如果我这样做：

WmeanAge <- aggregate(ages$Age,list(ages$Indiv),weighted.mean,ages$W)

我明白了：

Error in weighted.mean.default(X[[1L]], ...) : 
  'x' and 'w' must have the same length

我想我应该：

  Group.1           x
1       1 13.57142857
2       2 68.33333333

我做错了什么？提前谢谢！

Answer 1

是的，你打败了我。但无论如何，这是我使用plyr和dplyr：

的答案

ages = data.frame(Indiv = c(1,1,2,2),
              Age = c(10,15,5,100),
              W = c(2,5,1,2))

library(plyr)
ddply(ages, .(Indiv), summarize, 
      mean = mean(Age),
      wmean = weighted.mean(Age, w=W))


library(dplyr)
ages %.% 
  group_by(Indiv) %.% 
  summarise(mean = mean(Age), wmean = weighted.mean(Age, W))

Answer 2

问题是aggregate没有拆分w个参数 - 所以weighted.mean正在接收ages$Age的子集，但它没有收到{{1}的等效子集}}

试试ages$W套餐！这很棒。我在95％的脚本中使用它。

plyr

Answer 3

如果你想使用基本功能，这里有一种可能性

as.vector(by(ages[c("Age","W")],
    list(ages$Indiv),
     function(x) {
         do.call(weighted.mean, unname(x))
     }
))

由于聚合不会对多列进行子集，因此我使用更通用的by并将结果简化为向量。

Answer 4

您的权重值数量与您的群组数量不符，因此聚合无法正确折叠群组。这是一个使用for循环的非常不优雅的解决方案。

ages = data.frame(Indiv=c(1,1,2,2),Age=c(10,15,5,100),W=c(2,5,1,2))

age.Indiv <- vector()
  for(i in unique(ages$Indiv)){
  age.Indiv <- append(age.Indiv, weighted.mean( ages[ages$Indiv == i ,]$Age, 
                      ages[ages$Indiv == i ,]$W))
    } 
  names(age.Indiv) <- unique(ages$Indiv)
    age.Indiv

R加权算术平均值

4 个答案: