我有以下数据框
Year Category TotalSales AverageCount
1 2013 Beverages 102074.29 22190.06
2 2013 Condiments 55277.56 14173.73
3 2013 Confections 36415.75 12138.58
4 2013 Dairy Products 30337.39 24400.00
5 2013 Seafood 53019.98 27905.25
6 2014 Beverages 81338.06 35400.00
7 2014 Condiments 55948.82 19981.72
8 2014 Confections 44478.36 24710.00
9 2014 Dairy Products 84412.36 32466.00
10 2014 Seafood 65544.19 14565.37
我计算了TotalSales的累计总和,按年份按以下方法分组
dat <-within(dat, {
RunningTotal <- ave(dat$TotalSales, dat$Year, FUN = cumsum)
})
,输出就是这个,
Year Category TotalSales AverageCount RunningTotal
1 2013 Beverages 102074.29 22190.06 102074.29
2 2013 Condiments 55277.56 14173.73 157351.85
3 2013 Confections 36415.75 12138.58 193767.60
4 2013 Dairy Products 30337.39 24400.00 224104.99
5 2013 Seafood 53019.98 27905.25 277124.97
6 2014 Beverages 81338.06 35400.00 81338.06
7 2014 Condiments 55948.82 19981.72 137286.88
8 2014 Confections 44478.36 24710.00 181765.24
9 2014 Dairy Products 84412.36 32466.00 266177.60
10 2014 Seafood 65544.19 14565.37 331721.79
如何计算行RunningTotal
中元素的分组比率(RunningTotal[i+1] and RunningTotal[i]
之间的比率)?
我尝试使用mutate
dplyr
require(dplyr)
dat<-mutate(dat, Ratio = lag(RunningTotal)/RunningTotal)
我收到错误的输出(注意NAs)
Year Category TotalSales AverageCount RunningTotal Ratio
1 2013 Beverages 102074.29 22190.06 102074.29 NA
2 2013 Condiments 55277.56 14173.73 157351.85 0.6487009
3 2013 Confections 36415.75 12138.58 193767.60 0.8120648
4 2013 Dairy Products 30337.39 24400.00 224104.99 0.8646287
5 2013 Seafood 53019.98 27905.25 277124.97 0.8086784
6 2014 Beverages 81338.06 35400.00 81338.06 NA
7 2014 Condiments 55948.82 19981.72 137286.88 0.5924678
8 2014 Confections 44478.36 24710.00 181765.24 0.7552978
9 2014 Dairy Products 84412.36 32466.00 266177.60 0.6828720
10 2014 Seafood 65544.19 14565.37 331721.79 0.8024122
如何获得所需的输出,如下所示?
Year Category TotalSales AverageCount RunningTotal Ratio
2013 Beverages 102074.29 22190.06 102074.29 1.5415424393
2013 Condiments 55277.56 14173.73 157351.85 1.2314288011
2013 Confections 36415.75 12138.58 193767.6 1.1565658552
2013 Dairy Products 30337.39 24400 224104.99 1.2365854504
2013 Seafood 53019.98 27905.25 277124.97 0.2935067887
2014 Beverages 81338.06 35400 81338.06 1.6878553533
2014 Condiments 55948.82 19981.72 137286.88 1.3239811408
2014 Confections 44478.36 24710 181765.24 1.4644032049
2014 Dairy Products 84412.36 32466 266177.6 1.2462423209
2014 Seafood 65544.19 14565.37 331721.79 0
示例数据:
dat <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L), Category = structure(c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("Beverages", "Condiments",
"Confections", "Dairy Products", "Seafood"), class = "factor"),
TotalSales = c(102074.29, 55277.56, 36415.75, 30337.39, 53019.98,
81338.06, 55948.82, 44478.36, 84412.36, 65544.19), AverageCount = c(22190.06,
14173.73, 12138.58, 24400, 27905.25, 35400, 19981.72, 24710,
32466, 14565.37)), .Names = c("Year", "Category", "TotalSales",
"AverageCount"), class = "data.frame", row.names = c(NA, -10L
)
答案 0 :(得分:1)
执行第一次操作的dplyr
方式是:
dat <- dat %>%
group_by(Year) %>%
mutate(RunningTotal = cumsum(TotalSales)) %>%
ungroup
然后添加比率,使用
dat %>%
mutate(Ratio = c(RunningTotal[-1] / RunningTotal[-n()], 0))
虽然我很想提出最后一个值NA
,而不是0
。 2013海鲜(0.2935067887
)的比例也没有任何意义。要摆脱这种情况,您不想执行取消分组。所以像这样:
dat %>%
group_by(Year) %>%
mutate(
RunningTotal = cumsum(TotalSales),
Ratio = c(RunningTotal[-1] / RunningTotal[-n()], NA)
)