我的数据框df包含以下列:
ID <- c(1,1,1,2,2,3,3)
value <- c("A","B","C","A","C","B", "C")
start_time <- c("2017-12-21 14:04:15", "2017-12-21 14:13:04", "2017-12-21 14:04:40", "2017-12-08 13:18:28", "2017-12-08 13:19:03", "2017-12-06 11:33:31", "2017-12-06 11:32:37")
end_time <- c("2017-12-21 14:06:37","2017-12-21 14:54:0","2017-12-21 14:20:38","2017-12-08 13:18:35","2017-12-08 13:23:42","2017-12-06 11:38:27","2017-12-06 11:38:27")
预期结果:
如果对于具有相同ID的行以及(A或B)和C的值,开始和结束时间重叠,我想添加一个带有Y或N值的列。请注意,有些行可能没有全部3个值。
所以最后数据框应如下所示:
ID Value start_time end_time Overlap
1 A 2017-12-21 14:04:15 2017-12-21 14:06:37 Y
1 B 2017-12-21 14:13:04 2017-12-21 14:54:0 Y
1 C 2017-12-21 14:04:40 2017-12-21 14:20:38 Y
2 A 2017-12-08 13:18:28 2017-12-08 13:18:35 N
2 C 2017-12-08 13:19:03 2017-12-08 13:23:42 N
3 B 2017-12-06 11:33:31 2017-12-06 11:38:27 Y
3 C 2017-12-06 11:32:37 2017-12-06 11:38:27 Y
4 A 2017-11-01 08:00:00 2017-11-01 08:00:05 N
4 B 2017-11-01 08:00:04 2017-11-01 08:00:10 N
4 C 2017-11-01 08:00:11 2017-11-01 08:00:15 N
5 A 2017-11-01 08:01:25 2017-11-01 08:01:40 N
5 A 2017-11-01 08:01:42 2017-11-01 08:02:05 N
5 C 2017-11-01 08:02:06 2017-11-01 08:02:15 N
知道如何做到这一点?
答案 0 :(得分:0)
希望这有帮助!
library(dplyr)
df$start_time <- as.POSIXct(df$start_time, format="%Y-%m-%d %H:%M:%S")
df$end_time <- as.POSIXct(df$end_time, format="%Y-%m-%d %H:%M:%S")
df %>%
group_by(ID) %>%
arrange(ID, value) %>%
mutate(overlap = ifelse((start_time >= lag(start_time) & start_time <= lag(end_time)) |
(end_time >= lag(start_time) & end_time <= lag(end_time)) |
(start_time >= lag(start_time, 2) & start_time <= lag(end_time, 2) & !is.na(lag(start_time, 2))) |
(end_time >= lag(start_time, 2) & end_time <= lag(end_time, 2) & !is.na(lag(start_time, 2))), "Y", "N")) %>%
mutate(overlap = ifelse(value=='C', overlap, NA)) %>%
fill(overlap, .direction = "up") %>%
data.frame()
输出是:
ID value start_time end_time overlap
1 1 A 2017-12-21 14:04:15 2017-12-21 14:06:37 Y
2 1 B 2017-12-21 14:00:04 2017-12-21 14:00:00 Y
3 1 C 2017-12-21 14:04:40 2017-12-21 14:20:38 Y
4 2 A 2017-12-08 13:18:28 2017-12-08 13:18:35 N
5 2 C 2017-12-08 13:19:03 2017-12-08 13:23:42 N
6 3 B 2017-12-06 11:33:31 2017-12-06 11:38:27 Y
7 3 C 2017-12-06 11:32:37 2017-12-06 11:38:27 Y
示例数据:
df <- structure(list(ID = c(1, 1, 1, 2, 2, 3, 3), value = structure(c(1L,
2L, 3L, 1L, 3L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"),
start_time = structure(c(6L, 5L, 7L, 3L, 4L, 2L, 1L), .Label = c("2017-12-06 11:32:37",
"2017-12-06 11:33:31", "2017-12-08 13:18:28", "2017-12-08 13:19:03",
"2017-12-21 14:00:04", "2017-12-21 14:04:15", "2017-12-21 14:04:40"
), class = "factor"), end_time = structure(c(5L, 4L, 6L,
2L, 3L, 1L, 1L), .Label = c("2017-12-06 11:38:27", "2017-12-08 13:18:35",
"2017-12-08 13:23:42", "2017-12-21 14:00:0", "2017-12-21 14:06:37",
"2017-12-21 14:20:38"), class = "factor")), .Names = c("ID",
"value", "start_time", "end_time"), row.names = c(NA, -7L), class = "data.frame")