在dplyr中使用来自另一个data.frame的权重的rowwise weighted.mean

时间:2016-03-24 15:55:50

标签: r dplyr

我有一个data.frame,其中包含来自不同组的列(此处为ab),另一个data.frame包含用于执行加权平均值的权重:

test = data.frame(a.1=rep(2,5), b.1=rep(3,5), a.2=6:10, b.2=11:15)
tweights = data.frame(name=c('a.1', 'b.1', 'a.2', 'b.2'), 
                     w=c(0.2, 0.33, 0.8, 0.67))

对于test中的每一行,我都会对包含a的列执行加权平均值,其权重由tweights中的相应值给出,列相同与b

我试图做的事情:

test %>% rowwise() %>% 
  mutate(awmean = weighted.mean(c(a.1, a.2), 
                                tweights$w[grepl('a', tweights$name)]),
         bwmean = weighted.mean(c(b.1, b.2), 
                                tweights$w[grepl('b', tweights$name)]))

这工作正常,但这不高效也不优雅,我想避免明确提到列名(a.1a.2等),第二部分调用grepl对我来说看起来不干净......

我试过这样的事情,但这是错误的:

test %>% rowwise() %>%
  mutate(awmean = weighted.mean(contains('a'),
                                tweights$w[grepl('a', tweights$name)]))

Error: error in evaluating the argument 'x' in selecting a method 
for function 'weighted.mean': Error: could not find function "contains"

请注意,我在此假设列a.1 : a.n的顺序和tweights中相应行的顺序是相同的,这可以是正常的。真正处理weighted.mean中值和权重之间匹配的解决方案会更好......

1 个答案:

答案 0 :(得分:1)

也许是自定义功能?

# get weighted means, for names beginning with a certain letter
getWM <- function(letter1) {
  rgx <- paste0('^', letter1)
  apply(test[, grep(rgx, names(test))], 1, weighted.mean,
        w = subset(tweights, grepl(rgx, name))$w )
}

现在您可以拨打电话:

getWM('a')
[1] 5.2 6.0 6.8 7.6 8.4

或者,对于所有字母:

first_letters <- unique(gsub('[^a-z]', '', names(test)))
sapply(first_letters, getWM)

       a     b
[1,] 5.2  8.36
[2,] 6.0  9.03
[3,] 6.8  9.70
[4,] 7.6 10.37
[5,] 8.4 11.04