使用加权平均值聚合整个数据框

时间:2014-06-18 17:23:25

标签: r aggregate weighted-average

我尝试使用函数weighted.mean聚合数据框并继续出错。我的数据如下:

dat <- data.frame(date, nWords, v1, v2, v3, v4 ...)

我尝试过类似的事情:

aggregate(dat, by = list(dat$date), weighted.mean, w = dat$nWords)

但得到了

 Error in weighted.mean.default(X[[1L]], ...) : 
  'x' and 'w' must have the same length

还有另一个线程使用plyr回答这个问题但是对于一个变量,我想以这种方式聚合所有变量。

2 个答案:

答案 0 :(得分:1)

您可以使用data.table:

来完成
 library(data.table)

 #set up your data

 dat <- data.frame(date = c("2012-01-01","2012-01-01","2012-01-01","2013-01-01",
 "2013-01-01","2013-01-01","2014-01-01","2014-01-01","2014-01-01"), 
 nwords = 1:9, v1 = rnorm(9), v2 = rnorm(9), v3 = rnorm(9))

 #make it into a data.table

 dat = data.table(dat, key = "date")

 # grab the column names we want, generalized for V1:Vwhatever

 c = colnames(dat)[-c(1,2)]

 #get the weighted mean by date for each column

 for(n in c){
 dat[,
     n := weighted.mean(get(n), nwords),
     with = FALSE,
     by = date]
 }

 #keep only the unique dates and weighted means

 wms = unique(dat[,nwords:=NULL])

答案 1 :(得分:0)

尝试使用by

# your numeric data
x <- 111:120

# the weights
ww <- 10:1 

mat <- cbind(x, ww)

# the group variable (in your case is 'date')
y <- c(rep("A", 7), rep("B", 3))

by(data=mat, y, weighted.mean)

如果您想在数据框中显示结果,我建议使用plyr包:

plyr::ddply(data.frame(mat), "y", weighted.mean)