问题:需要根据每个行发生的时间窗口,将一个数据框的值添加到另一个数据框。
我有一个数据框,其中包含如下所示的单个事件:
Ind Date Time Event
1 FAU 15/11/2016 06:40:43 A
2 POR 15/11/2016 12:26:51 V
3 POR 15/11/2016 14:52:53 B
4 MAM 20/11/2016 08:12:19 G
5 SUR 03/12/2016 13:51:18 A
6 SUR 14/12/2016 07:47:06 V
第二个数据帧与正在进行的连续事件链接,如下所示:
Date Time Event
1 15/11/2016 06:56:48 1
2 15/11/2016 06:59:40 2
3 15/11/2016 07:27:36 3
4 15/11/2016 07:29:10 4
5 15/11/2016 07:34:51 5
6 15/11/2016 07:35:10 6
7 15/11/2016 07:37:19 7
8 15/11/2016 07:39:55 8
9 15/11/2016 07:51:59 9
10 15/11/2016 08:00:13 10
11 15/11/2016 08:08:01 11
12 15/11/2016 08:13:21 12
13 15/11/2016 08:16:21 13
14 15/11/2016 12:14:48 14
15 15/11/2016 12:16:58 15
16 15/11/2016 12:51:22 16
17 15/11/2016 12:52:09 17
18 15/11/2016 13:26:29 18
19 15/11/2016 13:26:55 19
20 15/11/2016 13:34:14 20
21 15/11/2016 13:50:41 21
22 15/11/2016 13:53:25 22
23 15/11/2016 14:15:17 23
24 15/11/2016 14:54:49 24
问题:我如何将它们组合起来,以便对于单个事件,我们可以看到它们在哪些连续事件中发生,例如:
Ind Date Time Eventx Eventy
1 FAU 15/11/2017 06:40:43 A 1
2 POR 15/11/2017 12:26:51 V 15
3 POR 15/11/2017 14:52:53 B 23
非常感谢
答案 0 :(得分:2)
这应该有效(至少在您的示例中有效):
df1 <- structure(list(Ind = c("FAU", "POR", "POR", "MAM", "SUR", "SUR"
), Date = c("15/11/2016", "15/11/2016", "15/11/2016", "20/11/2016",
"03/12/2016", "14/12/2016"), Time = c("06:40:43", "12:26:51",
"14:52:53", "08:12:19", "13:51:18", "07:47:06"), Event = c("A",
"V", "B", "G", "A", "V")), .Names = c("Ind", "Date", "Time",
"Event"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6"))
df2 <- structure(list(Date = c("15/11/2016", "15/11/2016", "15/11/2016",
"15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016",
"15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016",
"15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016",
"15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016",
"15/11/2016"), Time = c("06:56:48", "06:59:40", "07:27:36", "07:29:10",
"07:34:51", "07:35:10", "07:37:19", "07:39:55", "07:51:59", "08:00:13",
"08:08:01", "08:13:21", "08:16:21", "12:14:48", "12:16:58", "12:51:22",
"12:52:09", "13:26:29", "13:26:55", "13:34:14", "13:50:41", "13:53:25",
"14:15:17", "14:54:49"), Event = 1:24), .Names = c("Date", "Time",
"Event"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24"))
创建as.POSIXct
变量:
df1$datetime <- as.POSIXct(strptime(paste(df1$Date, df1$Time, sep = " "), "%d/%m/%Y %H:%M:%S"))
df2$datetime <- as.POSIXct(strptime(paste(df2$Date, df2$Time, sep = " "), "%d/%m/%Y %H:%M:%S"))
为count
启动新的df1
变量:
df1$count <- NA
现在,我们遍历df1
的行,并以相同的df2
并在Date
间隔内计算Time
中的出现次数:
for(i in 1:nrow(df1)){
df1$count[i] <- sum(df2$datetime[df2$Date == df1$Date[i]] < df1$datetime[i])
}
结果:
> df1
Ind Date Time Event datetime count
1 FAU 15/11/2016 06:40:43 A 2016-11-15 06:40:43 0
2 POR 15/11/2016 12:26:51 V 2016-11-15 12:26:51 15
3 POR 15/11/2016 14:52:53 B 2016-11-15 14:52:53 23
4 MAM 20/11/2016 08:12:19 G 2016-11-20 08:12:19 0
5 SUR 03/12/2016 13:51:18 A 2016-12-03 13:51:18 0
6 SUR 14/12/2016 07:47:06 V 2016-12-14 07:47:06 0
答案 1 :(得分:2)
我可以为您提供data.table
解决方案。唯一的问题是,我必须将第二个数据帧中第一个事件的开始移到一个较早的日期,因为它是在第一个数据帧中第一个事件的开始时间之后。
您将需要其他软件包data.table
和lubridate
。
library(data.table)
library(lubridate)
dt1 <- data.table(df1)
dt2 <- data.table(df2)
dt1[, Date.Time := as.POSIXct(strptime(paste(Date, Time, sep = " "), "%d/%m/%Y %H:%M:%S"))]
dt2[, Date.Time := as.POSIXct(strptime(paste(Date, Time, sep = " "), "%d/%m/%Y %H:%M:%S"))]
# Create the start and end time columns in the second data.table
dt2[, `:=`(Start.Time = Date.Time
, End.Time = shift(Date.Time, n = 1L, fill = NA, type = "lead"))]
# Change the start date to an earlier one
dt2[Event == 1,`:=`(Start.Time = Start.Time - days(1)) ]
# Merge on multiple conditions and the selection of the relevant columns
dt2[dt1, on=.(Start.Time < Date.Time
, End.Time > Date.Time)
, nomatch = 0L][,.(Ind
, Date
, Time
, Eventx = i.Event
, Eventy = Event)]
# Output of the last merge
Ind Date Time Eventx Eventy
1: FAU 15/11/2016 06:56:48 A 1
2: POR 15/11/2016 12:16:58 V 15
3: POR 15/11/2016 14:15:17 B 23