我对 R 和 DPLYR 还很陌生,我一直在解决这个问题:
我有两张桌子:
(1) 汽车维修
(2) 随时间推移每辆车的欠款
我想做的是在修复表上创建三个额外的列,它给了我: (1) 修理完成时欠汽车的金额, (2) 3个月后 (3) 最后存档的付款记录。
如果维修日期与任何付款记录不符,我需要使用记录中最接近的欠款金额。
比如:
有什么想法可以做到吗?
这里是数据框:
汽车维修:
df_repair <- data.frame(unique_id =
c("A1","A2","A3","A4","A5","A6","A7","A8"),
car_number = c(1,1,1,2,2,2,3,3),
repair_done = c("Front Fender","Front
Lights","Rear Lights","Front Fender", "Rear Fender","Rear Lights","Front
Lights","Front Fender"),
YearMonth = c("2014-03","2016-03","2016-07","2015-05","2015-08","2016-01","2018-01","2018-05"))
df_owed <- data.frame(car_number = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3),
YearMonth = c("2014-02","2014-05","2014-06","2014-08","2015-06","2015-12","2016-03","2016-04","2016-05","2016-06","2016-07","2016-08","2015-05","2015-08","2015-12","2016-03","2018-01","2018-02","2018-03","2018-04","2018-05","2018-09"),
amount_owed = c(20000,18000,17500,16000,10000,7000,6000,5500,5000,4500,4000,3000,10000,8000,6000,0,50000,40000,35000,30000,25000,15000))
答案 0 :(得分:1)
使用 zoo
表示年月和 tidyverse
,您可以尝试以下操作。使用 left_join
通过 df_owed
将所有 df_repair
数据添加到您的 car_number
数据中。您可以使用 yearmon
将年月列转换为 zoo
对象。然后,按 df_owed
中的年-月列对行进行排序。
对于每个 unique_id
(使用 group_by
),您可以创建您感兴趣的三列。第一个将使用最新的 amount_owed
,其中欠款日期早于服务日期。然后第二个(3 个月)将使用第一个 amount_owed
值,其中欠款日期在服务日期之后 3 个月 (3/12)。最后,最近的只取 last
中的 amount_owed
值。
使用示例数据,结果略有不同,可能是由于数据帧与帖子中的图像不匹配。
library(tidyverse)
library(zoo)
df_repair %>%
left_join(df_owed, by = "car_number") %>%
mutate_at(c("YearMonth.x", "YearMonth.y"), as.yearmon) %>%
arrange(YearMonth.y) %>%
group_by(unique_id, car_number) %>%
summarise(
owed_repair_done = last(amount_owed[YearMonth.y <= YearMonth.x]),
owed_3_months = first(amount_owed[YearMonth.y >= YearMonth.x + 3/12]),
owed_most_recent = last(amount_owed)
)