我遇到了一个问题,我想找出某个雇员的缺勤天数,如果该雇员连续3天不来,应该在新列中添加3天(可能连续几天),问题是开始日期和结束日期是否存在,所以如果员工相同,我想进行匹配,下一个缺勤的开始日期是应该添加的连续日期,我在此处附加了屏幕截图和表格索引。 Excel或R的任何帮助都将有所帮助。我已经尝试过Max if
和Sumif
。唯一的问题是,如果他/她连续两天缺席,我只想添加
ID START_DATE END_DATE ABSENCE_DAYS
3 14-06-18 14-06-18 1
3 17-06-18 17-06-18 1
3 18-06-18 18-06-18 1
4 01-06-18 01-06-18 1
4 04-06-18 04-06-18 1
4 21-06-18 22-06-18 2
4 27-06-18 27-06-18 1
4 28-06-18 28-06-18 1
4 04-07-18 04-07-18 1
4 05-07-18 05-07-18 1
4 09-07-18 09-07-18 1
4 11-07-18 11-07-18 1
4 23-07-18 23-07-18 1
4 24-07-18 24-07-18 1
4 25-07-18 25-07-18 1
5 07-06-18 08-06-18 2
5 28-06-18 28-06-18 1
5 27-07-18 27-07-18 0.5
6 10-06-18 11-06-18 2
6 17-06-18 21-06-18 5
6 24-06-18 25-06-18 2
6 26-06-18 03-07-18 6
6 15-07-18 15-07-18 1
6 22-07-18 22-07-18 1
例如,雇员4在23,24和25号连续休了3个假,因此在新列中说他连续3天缺席。
答案 0 :(得分:1)
使用dplyr::lag
和dplyr::lead
我们可以检查END和START以查看它们是否连续
library(dplyr)
library(lubridate)
data %>% group_by(ID) %>%
mutate(Forward = dmy(START_DATE)-lag(dmy(END_DATE)), Backward = dmy(END_DATE)-lead(dmy(START_DATE)),
Flag=ifelse(Forward==1 | Backward==-1, TRUE,FALSE),
Total=sum(ABSENCE_DAYS[Flag],na.rm = T))
data <- read.table(text="
ID START_DATE END_DATE ABSENCE_DAYS
3 14-06-18 14-06-18 1
3 17-06-18 17-06-18 1
3 18-06-18 18-06-18 1
4 01-06-18 01-06-18 1
4 04-06-18 04-06-18 1
4 21-06-18 22-06-18 2
4 27-06-18 27-06-18 1
4 28-06-18 28-06-18 1
4 04-07-18 04-07-18 1
4 05-07-18 05-07-18 1
4 09-07-18 09-07-18 1
4 11-07-18 11-07-18 1
4 23-07-18 23-07-18 1
4 24-07-18 24-07-18 1
4 25-07-18 25-07-18 1
5 07-06-18 08-06-18 2
5 28-06-18 28-06-18 1
5 27-07-18 27-07-18 0.5
6 10-06-18 11-06-18 2
6 17-06-18 21-06-18 5
6 24-06-18 25-06-18 2
6 26-06-18 03-07-18 6
6 15-07-18 15-07-18 1
6 22-07-18 22-07-18 1
6 22-07-18 22-07-18 1
",header=T, stringsAsFactors = F)
答案 1 :(得分:1)
这是使用data.table
的解决方案。您可以检查当前行的START_DATE是否在上一行的END_DATE之后一天,然后使用cumsum
将它们分组在一起。此后,只要将它们正确分组即可,仅是ABSENCE_DAYS的简单总和。
welfare[, TOTAL_ABSENCE := sum(ABSENCE_DAYS),
by=.(ID, cumsum(START_DATE != shift(END_DATE, fill=1L) + 1L))]
输出:
ID START_DATE END_DATE ABSENCE_DAYS TOTAL
1: 3 2018-06-14 2018-06-14 1.0 1.0
2: 3 2018-06-17 2018-06-17 1.0 2.0
3: 3 2018-06-18 2018-06-18 1.0 2.0
4: 4 2018-06-01 2018-06-01 1.0 1.0
5: 4 2018-06-04 2018-06-04 1.0 1.0
6: 4 2018-06-21 2018-06-22 2.0 2.0
7: 4 2018-06-27 2018-06-27 1.0 2.0
8: 4 2018-06-28 2018-06-28 1.0 2.0
9: 4 2018-07-04 2018-07-04 1.0 2.0
10: 4 2018-07-05 2018-07-05 1.0 2.0
11: 4 2018-07-09 2018-07-09 1.0 1.0
12: 4 2018-07-11 2018-07-11 1.0 1.0
13: 4 2018-07-23 2018-07-23 1.0 3.0
14: 4 2018-07-24 2018-07-24 1.0 3.0
15: 4 2018-07-25 2018-07-25 1.0 3.0
16: 5 2018-06-07 2018-06-08 2.0 2.0
17: 5 2018-06-28 2018-06-28 1.0 1.0
18: 5 2018-07-27 2018-07-27 0.5 0.5
19: 6 2018-06-10 2018-06-11 2.0 2.0
20: 6 2018-06-17 2018-06-21 5.0 5.0
21: 6 2018-06-24 2018-06-25 2.0 8.0
22: 6 2018-06-26 2018-07-03 6.0 8.0
23: 6 2018-07-15 2018-07-15 1.0 1.0
24: 6 2018-07-22 2018-07-22 1.0 1.0
ID START_DATE END_DATE ABSENCE_DAYS TOTAL
数据:
library(data.table)
welfare <- fread(
"ID START_DATE END_DATE ABSENCE_DAYS
3 14-06-18 14-06-18 1
3 17-06-18 17-06-18 1
3 18-06-18 18-06-18 1
4 01-06-18 01-06-18 1
4 04-06-18 04-06-18 1
4 21-06-18 22-06-18 2
4 27-06-18 27-06-18 1
4 28-06-18 28-06-18 1
4 04-07-18 04-07-18 1
4 05-07-18 05-07-18 1
4 09-07-18 09-07-18 1
4 11-07-18 11-07-18 1
4 23-07-18 23-07-18 1
4 24-07-18 24-07-18 1
4 25-07-18 25-07-18 1
5 07-06-18 08-06-18 2
5 28-06-18 28-06-18 1
5 27-07-18 27-07-18 0.5
6 10-06-18 11-06-18 2
6 17-06-18 21-06-18 5
6 24-06-18 25-06-18 2
6 26-06-18 03-07-18 6
6 15-07-18 15-07-18 1
6 22-07-18 22-07-18 1")
cols <- c("START_DATE", "END_DATE")
welfare[, (cols) := lapply(.SD, as.Date, format="%d-%m-%y"), .SDcols=cols]