我有以下数据框:
> LikelyRenew_ReasonB %>%
+ mutate(cum_Sum = ave(freq,Name,FUN = cumsum))
Name x freq cum_Sum
1 costC 1 10 10
2 costC 2 11 21
3 costC 3 17 38
4 costC 4 149 187
5 productsC 1 31 31
6 productsC 2 40 71
7 productsC 3 30 101
8 productsC 4 86 187
9 communC 1 51 51
10 communC 2 50 101
11 communC 3 34 135
12 communC 4 52 187
13 reimburseC 1 42 42
14 reimburseC 2 26 68
15 reimburseC 3 25 93
16 reimburseC 4 94 187
17 policyC 1 31 31
18 policyC 2 25 56
19 policyC 3 28 84
20 policyC 4 103 187
21 discountC 1 2 2
22 discountC 2 2 4
23 discountC 3 3 7
24 discountC 4 180 187
这是变量的样子:
> dput(head(LikelyRenew_ReasonB))
structure(list(Name = c("costC", "costC", "costC", "costC", "productsC",
"productsC"), x = c(1, 2, 3, 4, 1, 2), freq = c(10L, 11L, 17L,
149L, 31L, 40L)), .Names = c("Name", "x", "freq"), row.names = c(NA,
6L), class = "data.frame")
我试图获得每个组,每个频率分数的相对频率,然后是该组的相对频率之和。我在下面放了一个我正在寻找的样本 - 前三行是他们的freq / cum_Sum [x == 4]。最后一行应该是这3行的总和。
这可能吗?我完全难过了。
Name x freq cum_Sum IdealOutput *how i calculated IdealOutput
1 costC 1 10 10 5.35 (10/187)
2 costC 2 11 21 5.88 (11/187)
3 costC 3 17 38 9.09 (17/187)
4 costC 4 149 187 20.32 (sum of above 3 values)
答案 0 :(得分:2)
您可以尝试使用dplyr::lag
上的cum_Sum
来计算群组最后一行的IdealOutput
。
可以使用条件row_number() == n()
library(dplyr)
LikelyRenew_ReasonB %>% group_by(Name) %>%
arrange(Name, x) %>%
mutate(cum_Sum = cumsum(freq)) %>%
mutate(IdealOutput = ifelse(row_number() == n(),
lag(cum_Sum)/sum(freq), freq/sum(freq)))
# # A tibble: 6 x 5
# # Groups: Name [2]
# Name x freq cum_Sum IdealOutput
# <chr> <dbl> <int> <int> <dbl>
# 1 costC 1.00 10 10 0.0535
# 2 costC 2.00 11 21 0.0588
# 3 costC 3.00 17 38 0.0909
# 4 costC 4.00 149 187 0.203
# 5 productsC 1.00 31 31 0.437
# 6 productsC 2.00 40 71 0.437
数据:强>
LikelyRenew_ReasonB <- structure(list(Name = c("costC", "costC", "costC", "costC", "productsC",
"productsC"), x = c(1, 2, 3, 4, 1, 2), freq = c(10L, 11L, 17L,
149L, 31L, 40L)), .Names = c("Name", "x", "freq"), row.names = c(NA,
6L), class = "data.frame")