我需要根据多种条件匹配从df1到df2的每种情况,以创建df3。
library(lubridate)
df1 <- data.frame("Name" = c("Adams", "Adams", "Adams", "Adams", "Ball", "Ball", "Cash", "Cash", "David", "David"),
"Date.of.Service" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02")),
"StartTime" = c(845, 955, 2333, 0300, 1045, 1322, 1145, 344, 858, 123),
"Code" = c("101", "500", "103", "104", "501", "103", "102", "106", "102", "109"))
df2 <- data.frame("Name" = c("Adams", "Adams", "Ball", "Cash", "Cash", "David", "David"),
"Date.of.Shift" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01")),
"Shift" = c("CVCALL", "ORD", "OB", "ORD2", "OB", "SUP", "OB"),
"Day.Night.Shift" = c("Full24", "Full24", "Day", "Day", "Night", "Day", "Full24"))
条件:
如果某人一天有1次轮班,那么匹配轮班日期的案件应转到该轮班
如果df1 $ code是“心脏代码”,并且此人有“ CVCALL”班次,则提供该班次
如果某人一天有2次轮班,则应根据StartTime将当天的病例分配给轮班(日轮班发生在629和1629之间,夜班发生在2059和2359之间)
如果第二天的StartTime在000和700之间,并且某人在前一天进行了“夜”班或“ FULL24”班,则应转到该班(如果他们在夜间AND Full24,输入NA)
我尝试了以下代码。第一个left_join和mutate可以工作,但是当我到达第二个left_join和mutate时却出现错误。 Error in mutate_impl(.data, dots) :
Evaluation error: object 'Day.Night.Shift' not found.
library(dplyr)
Heart.Codes <- c("500", "501")
df3 = df1 %>%
# Bring in matching records in availability points. Filter df2 to records that are either
# (1) the only record for that person, or (2) CV shifts.
left_join(df2 %>%
group_by(Name, Date.of.Shift) %>%
mutate(num.shifts = n()) %>%
filter(num.shifts == 1 | Shift %in% c("CVCALL")),
by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
# We want to keep Shift and ShiftDate for records from availability that are either
# (1) the only record for that person, or (2) CV shifts that join to a
# "heart" type in df1.
mutate(Shift = case_when(num.shifts == 1 ~ Shift,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Shift,
T ~ NA_integer_),
Date.of.Shift = case_when(num.shifts == 1 ~ Date.of.Service,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Date.of.Service),
Day.Night.Shift = case_when(num.shifts == 1 ~ Day.Night.Shift,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Day.Night.Shift)) %>%
select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>%
# assign correct shift when there are two shifts. Filter df2 to records that have two shifts in a day.
left_join(df2 %>%
group_by(Name, Date.of.Shift) %>%
mutate(num.shifts = n()) %>%
filter(num.shifts == 2),
by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
mutate(Shift = case_when(num.shifts == 2 & StartTime > 629 & StartTime < 1629 & Day.Night.Shift == "Day" ~ Shift,
num.shifts == 2 & StartTime > 2059 & StartTime < 2359 & Day.Night.Shift == "Night" ~ Shift,
T ~ NA_integer_),
Date.of.Shift = case_when(num.shifts == 2 & StartTime > 629 & StartTime < 1629 & Day.Night.Shift == "Day" ~ Date.of.Shift,
num.shifts == 2 & StartTime > 2059 & StartTime < 2359 & Day.Night.Shift == "Night" ~ Date.of.Shift)) %>%
select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>%
# Bring in records whose shift date is the day before the case date.
left_join(df2 %>%
group_by(Name, Date.of.Shift) %>%
mutate(ShiftDateOneDayLater = Date.of.Shift + 1),
by = c("Name", "Date.of.Service" = "ShiftDateOneDayLater")) %>%
# Keep Shift and Date of Shift only if StartTime is between 0000 and 0659.
mutate(Shift = case_when(!is.na(Shift.x) ~ Shift.x,
Start.Time > 0 & Start.Time < 659 ~ Shift.y),
Date.of.Shift = case_when(!is.na(Date.of.Shift.x) ~ Date.of.Shift.x,
Start.Time > 0 & Start.Time < 659 ~ Date.of.Shift.y)) %>%
select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift)
基于这些条件,代码将生成此新的df3数据帧。
df3 <- data.frame("Name" = c("Adams", "Adams", "Adams", "Adams", "Ball", "Ball", "Cash", "Cash", "David", "David"),
"Date.of.Service" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02")),
"StartTime" = c(845, 955, 2333, 0300, 1045, 1322, 1145, 344, 858, 123),
"Code" = c("101", "500", "103", "104", "501", "103", "102", "106", "102", "109"),
"Date.of.Shift" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", NA, "2005-10-01")),
"Shift" = c("ORD", "CVCALL", "ORD", "ORD", "OB", "OB", "ORD2", "OB", NA, "OB"),
"Day.Night.Shift" = c("Full24", "Full24", "Full24", "Full24", "Day", "Day", "Day", "Night", NA, "Full24"))
答案 0 :(得分:0)
之所以给出此错误消息,是因为在第二个联接中,左表和右表都有一个名为Day.Night.Shift
的列。当表中具有相同名称的列(并且该列不属于联接的一部分)时,dplyr
会自动将它们重命名为Day.Night.Shift.x
和Day.Night.Shift.y
。我发现将所有内容运行到联接以查看发生了什么是有帮助的:
df3 = df1 %>%
# Bring in matching records in availability points. Filter df2 to records that are either
# (1) the only record for that person, or (2) CV shifts.
left_join(df2 %>%
group_by(Name, Date.of.Shift) %>%
mutate(num.shifts = n()) %>%
filter(num.shifts == 1 | Shift %in% c("CVCALL")),
by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
# We want to keep Shift and ShiftDate for records from availability that are either
# (1) the only record for that person, or (2) CV shifts that join to a
# "heart" type in df1.
mutate(Shift = case_when(num.shifts == 1 ~ Shift,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Shift,
T ~ NA_integer_),
Date.of.Shift = case_when(num.shifts == 1 ~ Date.of.Service,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Date.of.Service),
Day.Night.Shift = case_when(num.shifts == 1 ~ Day.Night.Shift,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Day.Night.Shift)) %>%
select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>%
# assign correct shift when there are two shifts. Filter df2 to records that have two shifts in a day.
left_join(df2 %>%
group_by(Name, Date.of.Shift) %>%
mutate(num.shifts = n()) %>%
filter(num.shifts == 2),
by = c("Name", "Date.of.Service" = "Date.of.Shift"))
您可以通过在Day.Night.Shift.x
(以及以下Day.Night.Shift.y
)中引用mutate
或select
来消除错误。