基于 R 中另一个数据框中的另一列在一个数据框中创建一列

时间:2021-02-24 17:22:19

标签: r dplyr tidyverse

我对 R 和 DPLYR 还很陌生,我一直在解决这个问题:

我有两张桌子:

(1) 汽车维修

enter image description here

(2) 随时间推移每辆车的欠款

enter image description here

我想做的是在修复表上创建三个额外的列,它给了我: (1) 修理完成时欠汽车的金额, (2) 3个月后 (3) 最后存档的付款记录。

如果维修日期与任何付款记录不符,我需要使用记录中最接近的欠款金额。

比如:

enter image description here

有什么想法可以做到吗?

这里是数据框:

汽车维修:

 df_repair <- data.frame(unique_id = 
 c("A1","A2","A3","A4","A5","A6","A7","A8"),
 car_number = c(1,1,1,2,2,2,3,3),
 repair_done = c("Front Fender","Front 
 Lights","Rear Lights","Front Fender", "Rear Fender","Rear Lights","Front 
 Lights","Front Fender"),
 YearMonth = c("2014-03","2016-03","2016-07","2015-05","2015-08","2016-01","2018-01","2018-05"))


df_owed <- data.frame(car_number = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3),
                      YearMonth = c("2014-02","2014-05","2014-06","2014-08","2015-06","2015-12","2016-03","2016-04","2016-05","2016-06","2016-07","2016-08","2015-05","2015-08","2015-12","2016-03","2018-01","2018-02","2018-03","2018-04","2018-05","2018-09"),

amount_owed = c(20000,18000,17500,16000,10000,7000,6000,5500,5000,4500,4000,3000,10000,8000,6000,0,50000,40000,35000,30000,25000,15000))

1 个答案:

答案 0 :(得分:1)

使用 zoo 表示年月和 tidyverse,您可以尝试以下操作。使用 left_join 通过 df_owed 将所有 df_repair 数据添加到您的 car_number 数据中。您可以使用 yearmon 将年月列转换为 zoo 对象。然后,按 df_owed 中的年-月列对行进行排序。

对于每个 unique_id(使用 group_by),您可以创建您感兴趣的三列。第一个将使用最新的 amount_owed,其中欠款日期早于服务日期。然后第二个(3 个月)将使用第一个 amount_owed 值,其中欠款日期在服务日期之后 3 个月 (3/12)。最后,最近的只取 last 中的 amount_owed 值。

使用示例数据,结果略有不同,可能是由于数据帧与帖子中的图像不匹配。

library(tidyverse)
library(zoo)

df_repair %>%
  left_join(df_owed, by = "car_number") %>%
  mutate_at(c("YearMonth.x", "YearMonth.y"), as.yearmon) %>%
  arrange(YearMonth.y) %>%
  group_by(unique_id, car_number) %>%
  summarise(
    owed_repair_done = last(amount_owed[YearMonth.y <= YearMonth.x]),
    owed_3_months = first(amount_owed[YearMonth.y >= YearMonth.x + 3/12]),
    owed_most_recent = last(amount_owed)
  )