如何按变量创建新列?

时间:2017-03-26 09:54:20

标签: r

我的数据包括不同人(ID)在每周Day的数据以及他们在医院不同区域或Ward上花费的时间。我得到的时间是分钟:秒或Duration。我的数据的一个例子是:

ShiftData <- data.frame(ID = c("Nelson", "Nelson", "Nelson", "Nelson", "Nelson", 
                      "Justin", "Justin", "Justin", "Justin", "Justin", 
                      "Nelson", "Nelson", "Nelson", "Nelson", "Nelson", 
                      "Justin", "Justin", "Justin", "Justin", "Justin"), 
               Day = c("Monday", "Monday", "Monday", "Monday", "Monday", 
                       "Monday", "Monday", "Monday", "Monday", "Monday",
                      "Tuesday", "Tuesday", "Tuesday", "Tuesday", "Tuesday", 
                      "Tuesday", "Tuesday", "Tuesday", "Tuesday", "Tuesday"), 
               Ward = c("Gen", "Anaesth", "Front Desk", "PreOp", "Front Desk", 
                       "PreOp", "Front Desk", "Anaesth", "Front Desk", "Gen",
                       "Gen", "Anaesth", "PreOp", "Front Desk", "Gen", 
                       "Front Desk", "PreOp", "PostOp", "Front Desk", "Anaesth"),
               Duration = c("5:35", "4:08", "4:30", "6:33", "4:17", 
                            "15:35", "4:28", "9:37", "18:33", "4:20",
                            "9:45", "8:28", "6:37", "2:34", "4:27", 
                            "19:35", "4:20", "9:47", "11:33", "4:26"))

我首先希望包含一个列,指示每个ID何时处于轮换或轮班。 "Front Desk"列中的Ward表示一个人何时改变他们的班次。一个人可以从"Front Desk"开始,由他们前一天工作的小时数来规定(当前分析不需要这个计算)。我的预期产量会     是:

ShiftData$Shift <- c(1,1,0,2,0,
                     1,0,2,0,3,
                     1,1,1,0,2,
                     0,1,1,0,2)

我的问题类似于this question,除非有"Front Desk"我想要0以及之后的任何活动,要按顺序计算。

我该如何创建它?

我知道我可以使用以下内容为"Front Desk"添加0:

ShiftData$Shift <- ifelse(ShiftData$Ward=='Front Desk', 0, NA)

但我不确定如何为列的其他部分包含顺序计数?

2 个答案:

答案 0 :(得分:2)

dplyr

可以解决此问题
ShiftData$Shift <- (ShiftData %>%
                    group_by(ID,Day) %>%
                    mutate(tmp = ifelse(Ward=="Front Desk",1,0), #tag to sum front desk shifts
                           tmp2 = cumsum(tmp),                   #cumsum shows shifts in a day
                           Ward1 = Ward[1],                      #this and the below count your first shift if you didn't start on desk duty
                           shift = ifelse(Ward1=="Front Desk",tmp2,tmp2+1))
                    )$shift
ShiftData$Shift[ShiftData$Ward=="Front Desk"] <- 0

答案 1 :(得分:2)

请注意,您的问题与this one非常相似。

所以这是一种解决方法:

library(dplyr)

ShiftData %>%
  group_by(ID, Day) %>% 
  mutate(Shift = cumsum(Ward != "Front Desk" & lag(Ward) %in% c("Front Desk", NA))) %>% 
  mutate(Shift = ifelse(Ward == "Front Desk", 0, Shift))

# Source: local data frame [20 x 5]
# Groups: ID, Day [4]
# 
#        ID     Day       Ward Duration Shift
#    <fctr>  <fctr>     <fctr>   <fctr> <dbl>
# 1  Nelson  Monday        Gen     5:35     1
# 2  Nelson  Monday    Anaesth     4:08     1
# 3  Nelson  Monday Front Desk     4:30     0
# 4  Nelson  Monday      PreOp     6:33     2
# 5  Nelson  Monday Front Desk     4:17     0
# 6  Justin  Monday      PreOp    15:35     1
# 7  Justin  Monday Front Desk     4:28     0
# 8  Justin  Monday    Anaesth     9:37     2
# 9  Justin  Monday Front Desk    18:33     0
# 10 Justin  Monday        Gen     4:20     3
# 11 Nelson Tuesday        Gen     9:45     1
# 12 Nelson Tuesday    Anaesth     8:28     1
# 13 Nelson Tuesday      PreOp     6:37     1
# 14 Nelson Tuesday Front Desk     2:34     0
# 15 Nelson Tuesday        Gen     4:27     2
# 16 Justin Tuesday Front Desk    19:35     0
# 17 Justin Tuesday      PreOp     4:20     1
# 18 Justin Tuesday     PostOp     9:47     1
# 19 Justin Tuesday Front Desk    11:33     0
# 20 Justin Tuesday    Anaesth     4:26     2

工作原理:分组后,我们创建Shift列,每次非前台前面都有一个前台,加1。然后我们在所有Front Desk行上将Shift替换为0.