我有一个data.table dt:
names <- c("john","mary","mary","mary","mary","mary","mary","tom","tom","tom","mary","john","john","john","tom","tom")
dates <- c(as.Date("2010-06-01"),as.Date("2010-06-01"),as.Date("2010-06-05"),as.Date("2010-06-09"),as.Date("2010-06-13"),as.Date("2010-06-17"),as.Date("2010-06-21"),as.Date("2010-07-09"),as.Date("2010-07-13"),as.Date("2010-07-17"),as.Date("2010-06-01"),as.Date("2010-08-01"),as.Date("2010-08-05"),as.Date("2010-08-09"),as.Date("2010-09-03"),as.Date("2010-09-04"))
shifts_missed <- c(2,11,11,11,11,11,11,6,6,6,1,5,5,5,0,2)
shift <- c("Day","Night","Night","Night","Night","Night","Night","Day","Day","Day","Day","Night","Night","Night","Night","Day")
df <- data.frame(names=names, dates=dates, shifts_missed=shifts_missed, shift=shift)
dt <- as.data.table(df)
names dates shifts_missed shift
john 2010-06-01 2 Day
mary 2010-06-01 11 Night
mary 2010-06-05 11 Night
mary 2010-06-09 11 Night
mary 2010-06-13 11 Night
mary 2010-06-17 11 Night
mary 2010-06-21 11 Night
tom 2010-07-09 6 Day
tom 2010-07-13 6 Day
tom 2010-07-17 6 Day
mary 2010-06-01 1 Day
john 2010-08-01 5 Night
john 2010-08-05 5 Night
john 2010-08-09 5 Night
tom 2010-09-03 0 Night
tom 2010-09-04 2 Day
最终,我想要的是获得以下内容:
names dates shifts_missed shift count
john 2010-06-01 2 Day 1
mary 2010-06-01 11 Night 1
mary 2010-06-05 11 Night 1
mary 2010-06-09 11 Night 1
mary 2010-06-13 11 Night 1
mary 2010-06-17 11 Night 1
mary 2010-06-21 11 Night 1
tom 2010-07-09 6 Day 1
tom 2010-07-13 6 Day 1
tom 2010-07-17 6 Day 1
mary 2010-06-01 1 Day 1
john 2010-08-01 5 Night 1
john 2010-08-05 5 Night 1
john 2010-08-09 5 Night 1
tom 2010-09-03 0 Night 0
tom 2010-09-04 2 Day 1
john 2010-06-01 2 Night 1
mary 2010-06-05 11 Day 1
mary 2010-06-09 11 Day 1
mary 2010-06-13 11 Day 1
mary 2010-06-17 11 Day 1
mary 2010-06-21 11 Day 1
tom 2010-07-09 6 Night 1
tom 2010-07-13 6 Night 1
tom 2010-07-17 6 Night 1
john 2010-08-05 5 Day 1
john 2010-08-09 5 Day 1
tom 2010-09-04 2 Night 1
如您所见,数据的后半部分几乎与上半部分重复。但是,如果shift_missed = 0,则不应该重复,如果shifting_missed是奇数,则第一行不应重复,但其余行应该重复。然后它应该在count列中为所有添加1,除非在shift_missed = 0时。
我已经看到了一些可以解释的答案!重复或唯一,但shift_missed中的这些值并不是唯一的。我确定这不是过于复杂,可能是一个多步骤的过程,但我无法弄清楚如何隔离奇数shift_missed列的第一行。
答案 0 :(得分:1)
dt[, is.in := if(shifts_missed[1] %% 2 == 0) T else c(F, rep(T, .N-1))
, by = .(names, shift)]
rbind(dt, dt[is.in & shifts_missed != 0])
添加额外的列部分应该是显而易见的。