这是示例数据集:
example = data.frame(
bucket = c(0,0,0,0,0,1,1,1,1,1),
bucket2 = c(0,1,2,3,4,0,1,2,3,4),
rate = c(0.95,0.02,0.01,0.005,0,0.9,0.05,0.02,0.01,0))
我需要将每个存储桶的费率之和设为1。目前,它不等于1。
example %>% group_by(bucket) %>% summarise(sum(rate))
因此,我需要一种方法来插入带有费率的新行,以使按存储桶进行费率分组的总和始终为1。对于此示例,我需要插入2个新行,如下所示:
new_rows = data.frame(bucket = c(0,1),
bucket2 = c('To make 0','To make 0'),
rate = c(0.015,0.02))
带有更多组的实际数据集要大得多,但是问题仍然是如何根据我的条件使用dplyr或其他软件包创建新行?任何帮助深表感谢。
答案 0 :(得分:2)
您几乎达到了想要的目标。
new_rows <- example %>%
group_by(bucket) %>%
summarise(rate = 1 - sum(rate))
new_rows
# bucket rate
# <dbl> <dbl>
# 1 0 0.015
# 2 1 0.02
bind_rows(example, new_rows)
# bucket bucket2 rate
# 1 0 0 0.950
# 2 0 1 0.020
# 3 0 2 0.010
# 4 0 3 0.005
# 5 0 4 0.000
# 6 1 0 0.900
# 7 1 1 0.050
# 8 1 2 0.020
# 9 1 3 0.010
# 10 1 4 0.000
# 11 0 NA 0.015
# 12 1 NA 0.020
答案 1 :(得分:1)
通常,添加行需要bind_rows
。每个组(在普通dplyr
动词之外,尽管不是bind_rows
之外)都需要do
块。我正在推断您需要的列,但是您可以将前提修改为其他任何内容。
example2 <- example %>%
group_by(bucket) %>%
do(bind_rows(., data_frame(bucket = .$bucket[1], bucket2 = max(.$bucket2)+1, rate = 1-sum(.$rate))))
example2
# # A tibble: 12 x 3
# # Groups: bucket [2]
# bucket bucket2 rate
# <dbl> <dbl> <dbl>
# 1 0 0 0.95
# 2 0 1 0.02
# 3 0 2 0.01
# 4 0 3 0.005
# 5 0 4 0
# 6 0 5 0.015
# 7 1 0 0.9
# 8 1 1 0.05
# 9 1 2 0.02
# 10 1 3 0.01
# 11 1 4 0
# 12 1 5 0.02
example2 %>% group_by(bucket) %>% summarise(sum(rate))
# # A tibble: 2 x 2
# bucket `sum(rate)`
# <dbl> <dbl>
# 1 0 1
# 2 1 1
如果每个组的计算都比较复杂,请意识到它的行详细版本类似于:
... %>%
do({
x <- .
# more calcs feasible here, it's just an R block
data_frame(
bucket = x$bucket[1],
bucket2 = max(x$bucket2) + 1,
rate = 1 - sum(x$rate)
)
})