根据唯一ID查找两列之间的时差并添加缺勤天数

时间:2018-08-02 06:52:08

标签: r excel datetime excel-formula

我遇到了一个问题,我想找出某个雇员的缺勤天数,如果该雇员连续3天不来,应该在新列中添加3天(可能连续几天),问题是开始日期和结束日期是否存在,所以如果员工相同,我想进行匹配,下一个缺勤的开始日期是应该添加的连续日期,我在此处附加了屏幕截图和表格索引。 Excel或R的任何帮助都将有所帮助。我已经尝试过Max ifSumif。唯一的问题是,如果他/她连续两天缺席,我只想添加

ID  START_DATE  END_DATE    ABSENCE_DAYS
3   14-06-18    14-06-18    1
3   17-06-18    17-06-18    1
3   18-06-18    18-06-18    1
4   01-06-18    01-06-18    1
4   04-06-18    04-06-18    1
4   21-06-18    22-06-18    2
4   27-06-18    27-06-18    1
4   28-06-18    28-06-18    1
4   04-07-18    04-07-18    1
4   05-07-18    05-07-18    1
4   09-07-18    09-07-18    1
4   11-07-18    11-07-18    1
4   23-07-18    23-07-18    1
4   24-07-18    24-07-18    1
4   25-07-18    25-07-18    1
5   07-06-18    08-06-18    2
5   28-06-18    28-06-18    1
5   27-07-18    27-07-18    0.5
6   10-06-18    11-06-18    2
6   17-06-18    21-06-18    5
6   24-06-18    25-06-18    2
6   26-06-18    03-07-18    6
6   15-07-18    15-07-18    1
6   22-07-18    22-07-18    1

例如,雇员4在23,24和25号连续休了3个假,因此在新列中说他连续3天缺席。

enter image description here

已更新

所需的输出看起来像这样,这只是一个示例 enter image description here

2 个答案:

答案 0 :(得分:1)

使用dplyr::lagdplyr::lead我们可以检查END和START以查看它们是否连续

library(dplyr)
library(lubridate)
data %>% group_by(ID) %>% 
          mutate(Forward = dmy(START_DATE)-lag(dmy(END_DATE)), Backward = dmy(END_DATE)-lead(dmy(START_DATE)), 
                 Flag=ifelse(Forward==1 | Backward==-1, TRUE,FALSE), 
                 Total=sum(ABSENCE_DAYS[Flag],na.rm = T)) 

数据

data <- read.table(text="
              ID  START_DATE  END_DATE    ABSENCE_DAYS
               3   14-06-18    14-06-18    1
               3   17-06-18    17-06-18    1
               3   18-06-18    18-06-18    1
               4   01-06-18    01-06-18    1
               4   04-06-18    04-06-18    1
               4   21-06-18    22-06-18    2
               4   27-06-18    27-06-18    1
               4   28-06-18    28-06-18    1
               4   04-07-18    04-07-18    1
               4   05-07-18    05-07-18    1
               4   09-07-18    09-07-18    1
               4   11-07-18    11-07-18    1
               4   23-07-18    23-07-18    1
               4   24-07-18    24-07-18    1
               4   25-07-18    25-07-18    1
               5   07-06-18    08-06-18    2
               5   28-06-18    28-06-18    1
               5   27-07-18    27-07-18    0.5
               6   10-06-18    11-06-18    2
               6   17-06-18    21-06-18    5
               6   24-06-18    25-06-18    2
               6   26-06-18    03-07-18    6
               6   15-07-18    15-07-18    1
               6   22-07-18    22-07-18    1
               6   22-07-18    22-07-18    1
               ",header=T, stringsAsFactors = F)

答案 1 :(得分:1)

这是使用data.table的解决方案。您可以检查当前行的START_DATE是否在上一行的END_DATE之后一天,然后使用cumsum将它们分组在一起。此后,只要将它们正确分组即可,仅是ABSENCE_DAYS的简单总和。

welfare[, TOTAL_ABSENCE := sum(ABSENCE_DAYS), 
    by=.(ID, cumsum(START_DATE != shift(END_DATE, fill=1L) + 1L))]

输出:

    ID START_DATE   END_DATE ABSENCE_DAYS TOTAL
 1:  3 2018-06-14 2018-06-14          1.0   1.0
 2:  3 2018-06-17 2018-06-17          1.0   2.0
 3:  3 2018-06-18 2018-06-18          1.0   2.0
 4:  4 2018-06-01 2018-06-01          1.0   1.0
 5:  4 2018-06-04 2018-06-04          1.0   1.0
 6:  4 2018-06-21 2018-06-22          2.0   2.0
 7:  4 2018-06-27 2018-06-27          1.0   2.0
 8:  4 2018-06-28 2018-06-28          1.0   2.0
 9:  4 2018-07-04 2018-07-04          1.0   2.0
10:  4 2018-07-05 2018-07-05          1.0   2.0
11:  4 2018-07-09 2018-07-09          1.0   1.0
12:  4 2018-07-11 2018-07-11          1.0   1.0
13:  4 2018-07-23 2018-07-23          1.0   3.0
14:  4 2018-07-24 2018-07-24          1.0   3.0
15:  4 2018-07-25 2018-07-25          1.0   3.0
16:  5 2018-06-07 2018-06-08          2.0   2.0
17:  5 2018-06-28 2018-06-28          1.0   1.0
18:  5 2018-07-27 2018-07-27          0.5   0.5
19:  6 2018-06-10 2018-06-11          2.0   2.0
20:  6 2018-06-17 2018-06-21          5.0   5.0
21:  6 2018-06-24 2018-06-25          2.0   8.0
22:  6 2018-06-26 2018-07-03          6.0   8.0
23:  6 2018-07-15 2018-07-15          1.0   1.0
24:  6 2018-07-22 2018-07-22          1.0   1.0
    ID START_DATE   END_DATE ABSENCE_DAYS TOTAL

数据:

library(data.table)
welfare <- fread(
"ID  START_DATE  END_DATE    ABSENCE_DAYS
3   14-06-18    14-06-18    1
3   17-06-18    17-06-18    1
3   18-06-18    18-06-18    1
4   01-06-18    01-06-18    1
4   04-06-18    04-06-18    1
4   21-06-18    22-06-18    2
4   27-06-18    27-06-18    1
4   28-06-18    28-06-18    1
4   04-07-18    04-07-18    1
4   05-07-18    05-07-18    1
4   09-07-18    09-07-18    1
4   11-07-18    11-07-18    1
4   23-07-18    23-07-18    1
4   24-07-18    24-07-18    1
4   25-07-18    25-07-18    1
5   07-06-18    08-06-18    2
5   28-06-18    28-06-18    1
5   27-07-18    27-07-18    0.5
6   10-06-18    11-06-18    2
6   17-06-18    21-06-18    5
6   24-06-18    25-06-18    2
6   26-06-18    03-07-18    6
6   15-07-18    15-07-18    1
6   22-07-18    22-07-18    1")    
cols <- c("START_DATE", "END_DATE")
welfare[, (cols) := lapply(.SD, as.Date, format="%d-%m-%y"), .SDcols=cols]