我需要在折叠R中的行时建立加权平均值。
数据
name = c("car1", "car2", "car2", "car2", "car3", "car1")
brand = c("b1", "b2", "b2", "b2", "b3", "b1")
production = c(10, 10, 30, 40, 10, 5)
fuelEconomy= c(1, 2, 3, 5, 2, 4)
size = c(10, 50, 30,40,20, 7)
adf = data.frame(brand, name, production, fuelEconomy, size)
按品牌和名称折叠
adfSum <- ddply(adf, .(brand, name),
summarise,
fuelEconomySum = sum(fuelEconomy*production)/sum(production),
productionSum = sum(production),
sizeSum = (sum(size*production)/sum(production)))
结果: 第一个加权平均值(fuelEconomySum)是正确的,但最后一个sizeSum是不正确的。正确的值在括号中。
brand name fuelEconomySum production sizeSum
b1 car1 2.000 15 17 (9)
b2 car2 3.875 80 120 (37.5)
b3 car3 2.000 10 20 (20)
我正在寻找一种同时创建多个加权平均值的解决方案。
由于
答案 0 :(得分:0)
这有效(使用dplyr
和magrittr
):
name = c("car1", "car2", "car2", "car2", "car3", "car1")
brand = c("b1", "b2", "b2", "b2", "b3", "b1")
production = c(10, 10, 30, 40, 10, 5)
fuelEconomy= c(1, 2, 3, 5, 2, 4)
size = c(10, 50, 30,40,20, 7)
adf = data.frame(brand, name, production, fuelEconomy, size)
library(magrittr)
library(dplyr)
afdSum <- adf %>%
group_by(brand, name) %>%
summarise(fuelEconomySum = sum(fuelEconomy*production)/sum(production),
productionSum = sum(production),
sizeSum = sum(size*production)/sum(production)) %>%
as.data.frame()
> afdSum
brand name fuelEconomySum productionSum sizeSum
1 b1 car1 2.000 15 9.0
2 b2 car2 3.875 80 37.5
3 b3 car3 2.000 10 20.0
编辑:顺便提一下,你的解决方案适用于我。
> devtools::session_info("plyr")
Session info ---------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui RStudio (0.99.491)
language en_US
collate en_US.UTF-8
tz <NA>
date 2016-09-14
Packages -------------------------------------------------------------------------------
package * version date source
plyr * 1.8.3 2015-06-12 CRAN (R 3.3.0)
Rcpp 0.12.5 2016-05-14 CRAN (R 3.3.0)