使用dplyr :: left_join和多个条件合并两个数据框

时间:2019-05-24 19:03:55

标签: r

我需要根据多种条件匹配从df1到df2的每种情况,以创建df3。

library(lubridate)

df1 <- data.frame("Name" = c("Adams", "Adams", "Adams", "Adams", "Ball", "Ball", "Cash", "Cash", "David", "David"),
                  "Date.of.Service" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02")),
                  "StartTime" = c(845, 955, 2333, 0300, 1045, 1322, 1145, 344, 858, 123),
                  "Code" = c("101", "500", "103", "104", "501", "103", "102", "106", "102", "109"))
df2 <- data.frame("Name" = c("Adams", "Adams", "Ball", "Cash", "Cash", "David", "David"),
                  "Date.of.Shift" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01")),
                  "Shift" = c("CVCALL", "ORD", "OB", "ORD2", "OB", "SUP", "OB"),
                  "Day.Night.Shift" = c("Full24", "Full24", "Day", "Day", "Night", "Day", "Full24"))

条件:

  1. 如果某人一天有1次轮班,那么匹配轮班日期的案件应转到该轮班

  2. 如果df1 $ code是“心脏代码”,并且此人有“ CVCALL”班次,则提供该班次

  3. 如果某人一天有2次轮班,则应根据StartTime将当天的病例分配给轮班(日轮班发生在629和1629之间,夜班发生在2059和2359之间)

  4. 如果第二天的StartTime在000和700之间,并且某人在前一天进行了“夜”班或“ FULL24”班,则应转到该班(如果他们在夜间AND Full24,输入NA)

我尝试了以下代码。第一个left_join和mutate可以工作,但是当我到达第二个left_join和mutate时却出现错误。 Error in mutate_impl(.data, dots) : Evaluation error: object 'Day.Night.Shift' not found.

library(dplyr)

Heart.Codes <- c("500", "501")

df3 = df1 %>%
  # Bring in matching records in availability points.  Filter df2 to records that are either
  # (1) the only record for that person, or (2) CV shifts.
  left_join(df2 %>%
              group_by(Name, Date.of.Shift) %>%
              mutate(num.shifts = n()) %>%
              filter(num.shifts == 1 | Shift %in% c("CVCALL")),
            by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
  # We want to keep Shift and ShiftDate for records from availability that are either
  # (1) the only record for that person, or (2) CV shifts that join to a
  # "heart" type in df1.
  mutate(Shift = case_when(num.shifts == 1 ~ Shift,
                           Code %in% Heart.Codes & Shift == "CVCALL" ~ Shift,
                           T ~ NA_integer_),
         Date.of.Shift = case_when(num.shifts == 1 ~ Date.of.Service, 
                                   Code %in% Heart.Codes & Shift == "CVCALL" ~ Date.of.Service),
         Day.Night.Shift = case_when(num.shifts == 1 ~ Day.Night.Shift, 
                                     Code %in% Heart.Codes & Shift == "CVCALL" ~ Day.Night.Shift)) %>%
  select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>% 
  # assign correct shift when there are two shifts. Filter df2 to records that have two shifts in a day.
  left_join(df2 %>%
              group_by(Name, Date.of.Shift) %>%
              mutate(num.shifts = n()) %>% 
              filter(num.shifts == 2),
            by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
  mutate(Shift = case_when(num.shifts == 2 & StartTime > 629 & StartTime < 1629 & Day.Night.Shift == "Day" ~ Shift,
                           num.shifts == 2 & StartTime > 2059 & StartTime < 2359 & Day.Night.Shift == "Night" ~ Shift,
                           T ~ NA_integer_),
         Date.of.Shift = case_when(num.shifts == 2 & StartTime > 629 & StartTime < 1629 & Day.Night.Shift == "Day" ~ Date.of.Shift,
                                   num.shifts == 2 & StartTime > 2059 & StartTime < 2359 & Day.Night.Shift == "Night" ~ Date.of.Shift)) %>%
  select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>% 
  # Bring in records whose shift date is the day before the case date.
  left_join(df2 %>%
            group_by(Name, Date.of.Shift) %>%
            mutate(ShiftDateOneDayLater = Date.of.Shift + 1),
          by = c("Name", "Date.of.Service" = "ShiftDateOneDayLater")) %>%
  # Keep Shift and Date of Shift only if StartTime is between 0000 and 0659.
  mutate(Shift = case_when(!is.na(Shift.x) ~ Shift.x,
                         Start.Time > 0 & Start.Time < 659 ~ Shift.y),
       Date.of.Shift = case_when(!is.na(Date.of.Shift.x) ~ Date.of.Shift.x,
                                 Start.Time > 0 & Start.Time < 659 ~ Date.of.Shift.y)) %>%
  select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift)

基于这些条件,代码将生成此新的df3数据帧。

df3 <- data.frame("Name" = c("Adams", "Adams", "Adams", "Adams", "Ball", "Ball", "Cash", "Cash", "David", "David"),
                  "Date.of.Service" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02")),
                  "StartTime" = c(845, 955, 2333, 0300, 1045, 1322, 1145, 344, 858, 123),
                  "Code" = c("101", "500", "103", "104", "501", "103", "102", "106", "102", "109"),
                  "Date.of.Shift" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", NA, "2005-10-01")),
                  "Shift" = c("ORD", "CVCALL", "ORD", "ORD", "OB", "OB", "ORD2", "OB", NA, "OB"),
                  "Day.Night.Shift" = c("Full24", "Full24", "Full24", "Full24", "Day", "Day", "Day", "Night", NA, "Full24"))

1 个答案:

答案 0 :(得分:0)

之所以给出此错误消息,是因为在第二个联接中,左表和右表都有一个名为Day.Night.Shift的列。当表中具有相同名称的列(并且该列不属于联接的一部分)时,dplyr会自动将它们重命名为Day.Night.Shift.xDay.Night.Shift.y。我发现将所有内容运行到联接以查看发生了什么是有帮助的:

df3 = df1 %>%
  # Bring in matching records in availability points.  Filter df2 to records that are either
  # (1) the only record for that person, or (2) CV shifts.
  left_join(df2 %>%
              group_by(Name, Date.of.Shift) %>%
              mutate(num.shifts = n()) %>%
              filter(num.shifts == 1 | Shift %in% c("CVCALL")),
            by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
  # We want to keep Shift and ShiftDate for records from availability that are either
  # (1) the only record for that person, or (2) CV shifts that join to a
  # "heart" type in df1.
  mutate(Shift = case_when(num.shifts == 1 ~ Shift,
                           Code %in% Heart.Codes & Shift == "CVCALL" ~ Shift,
                           T ~ NA_integer_),
         Date.of.Shift = case_when(num.shifts == 1 ~ Date.of.Service, 
                                   Code %in% Heart.Codes & Shift == "CVCALL" ~ Date.of.Service),
         Day.Night.Shift = case_when(num.shifts == 1 ~ Day.Night.Shift, 
                                     Code %in% Heart.Codes & Shift == "CVCALL" ~ Day.Night.Shift)) %>%
  select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>% 
  # assign correct shift when there are two shifts. Filter df2 to records that have two shifts in a day.
  left_join(df2 %>%
              group_by(Name, Date.of.Shift) %>%
              mutate(num.shifts = n()) %>% 
              filter(num.shifts == 2),
            by = c("Name", "Date.of.Service" = "Date.of.Shift"))

您可以通过在Day.Night.Shift.x(以及以下Day.Night.Shift.y)中引用mutateselect来消除错误。