我的df看起来像这样:
data <- data.frame(
"id" = c(2, 4, 5),
"paid" = c(80, 293.64, 157),
"basic_fee" = c(500, 140.59, 21.49),
"marketing_fee" = c(151.51, 10.12, 562.50),
"utility_fee" = c(65, 99.29, 102.35),
stringsAsFactors = F)
我想要实现的是
final <- data.frame(
"id" = c(2, 4, 5),
"paid" = c(80, 293.64, 157),
"basic_fee" = c(500, 140.59, 21.49),
"marketing_fee" = c(151.51, 10.12, 562.50),
"utility_fee" = c(65, 99.29, 102.35),
"paid_basic" = c(80, 140.59, 21.49),
"paid_marketing" = c(0, 10.12, 135.51),
"paid_utlity" = c(0, 99.29, 0),
stringsAsFactors = F)
两者之间的逻辑实际上很简单。对于每个ID,获得付款价值的数量,然后“尽可能多地”支付优先级的费用-基本,营销,公用事业。请注意,任何费用所支付的金额都不能超过其实际价值。
下面的代码可以工作,但是重复的代码部分很难看。现在,我有100多个列甚至更复杂的数据框。如果其他行有成千上万的行,我不会创建越来越复杂的代码。
final <-
data %>%
mutate(
paid_basic = if_else(basic_fee - paid > 0, basic_fee - (basic_fee - paid), basic_fee),
overpayment_basic = if_else(paid-paid_basic > 0, 1, 0),
paid_marketing = if_else(overpayment_basic == 1, (paid-paid_basic), 0),
paid_marketing = if_else(paid_marketing > marketing_fee, marketing_fee, paid_marketing),
overpayment_marketing = if_else(paid-paid_basic-paid_marketing > 0, 1, 0),
paid_utility = if_else(overpayment_marketing == 1, (paid-paid_basic-paid_marketing), 0),
paid_utility = if_else(paid_utility > utility_fee, utility_fee, paid_utility)
)
答案 0 :(得分:1)
我不确定这是否比您现有的解决方案复杂得多,但这是获取更多列的一种方法
library(tidyverse)
fee_data <- select_at(data, vars(contains('fee')))
fee_data %>%
accumulate(`+`) %>%
map2_df(data$paid + fee_data, ~ .y - .x) %>%
map2_df(fee_data, ~ pmax(0, pmin(.x, .y))) %>%
rename_all(~ paste0('paid_', sub('_fee', '', .x))) %>%
bind_cols(data, .)
# id paid basic_fee marketing_fee utility_fee paid_basic paid_marketing paid_utility
# 1 2 80.00 500.00 151.51 65.00 80.00 0.00 0.00
# 2 4 293.64 140.59 10.12 99.29 140.59 10.12 99.29
# 3 5 157.00 21.49 562.50 102.35 21.49 135.51 0.00
答案 1 :(得分:1)
我的原始答案无法推广到任意数量的行,因此这是另一种尝试:
r <- data$paid # keep track of remaining money
select(data, ends_with("_fee")) %>%
set_names(sub("(.*)_.*", "paid_\\1", names(.))) %>%
mutate_all( ~ {`<-`(x, map2_dbl(., r, ~ pmin(.x, .y))); `<<-`(r, r-x); x}) %>%
bind_cols(data, .)
哪个返回:
id paid basic_fee marketing_fee utility_fee paid_basic paid_marketing paid_utility
1 2 80.00 500.00 151.51 65.00 80.00 0.00 0.00
2 4 293.64 140.59 10.12 99.29 140.59 10.12 99.29
3 5 157.00 21.49 562.50 102.35 21.49 135.51 0.00
我使用mutate
而不是mutate_all
将map2_dbl
和pmin
应用于子集中的每一列。