ddply的加权平均值是错误的(R,ddply)

时间:2016-09-14 02:44:28

标签: r

我需要在折叠R中的行时建立加权平均值。

数据

name = c("car1", "car2", "car2", "car2", "car3", "car1") 
brand = c("b1", "b2", "b2", "b2", "b3", "b1")
production = c(10, 10, 30, 40, 10, 5) 
fuelEconomy= c(1, 2, 3, 5, 2, 4)
size = c(10, 50, 30,40,20, 7) 
adf = data.frame(brand, name, production, fuelEconomy, size)

按品牌和名称折叠

adfSum <- ddply(adf, .(brand, name),
                summarise,
                fuelEconomySum = sum(fuelEconomy*production)/sum(production),  
                productionSum = sum(production),  
sizeSum = (sum(size*production)/sum(production)))

结果: 第一个加权平均值(fuelEconomySum)是正确的,但最后一个sizeSum是不正确的。正确的值在括号中。

brand name fuelEconomySum production sizeSum
b1 car1 2.000 15 17 (9)
b2 car2 3.875 80 120 (37.5)
b3 car3 2.000 10 20 (20)

我正在寻找一种同时创建多个加权平均值的解决方案。

由于

1 个答案:

答案 0 :(得分:0)

这有效(使用dplyrmagrittr):

name = c("car1", "car2", "car2", "car2", "car3", "car1") 
brand = c("b1", "b2", "b2", "b2", "b3", "b1")
production = c(10, 10, 30, 40, 10, 5) 
fuelEconomy= c(1, 2, 3, 5, 2, 4)
size = c(10, 50, 30,40,20, 7) 
adf = data.frame(brand, name, production, fuelEconomy, size)

library(magrittr)
library(dplyr)

afdSum <- adf %>% 
  group_by(brand, name) %>% 
  summarise(fuelEconomySum = sum(fuelEconomy*production)/sum(production),
            productionSum = sum(production),
            sizeSum = sum(size*production)/sum(production)) %>% 
  as.data.frame()


> afdSum
    brand name fuelEconomySum productionSum sizeSum
  1    b1 car1          2.000            15     9.0
  2    b2 car2          3.875            80    37.5
  3    b3 car3          2.000            10    20.0

编辑:顺便提一下,你的解决方案适用于我。

> devtools::session_info("plyr")
Session info      ---------------------------------------------------------------------------
setting  value                       
version  R version 3.3.1 (2016-06-21)
system   x86_64, linux-gnu           
ui       RStudio (0.99.491)          
language en_US                       
collate  en_US.UTF-8                 
tz       <NA>                        
date     2016-09-14                  

Packages     -------------------------------------------------------------------------------
package * version date       source        
plyr    * 1.8.3   2015-06-12 CRAN (R 3.3.0)
Rcpp      0.12.5  2016-05-14 CRAN (R 3.3.0)