链接出现在R中相同时间窗口内的值

时间:2018-06-21 07:36:33

标签: r time data-cleaning

问题:需要根据每个行发生的时间窗口,将一个数据框的值添加到另一个数据框。

我有一个数据框,其中包含如下所示的单个事件:

  Ind       Date     Time Event
1 FAU 15/11/2016 06:40:43     A
2 POR 15/11/2016 12:26:51     V
3 POR 15/11/2016 14:52:53     B
4 MAM 20/11/2016 08:12:19     G
5 SUR 03/12/2016 13:51:18     A
6 SUR 14/12/2016 07:47:06     V

第二个数据帧与正在进行的连续事件链接,如下所示:

         Date     Time Event
1  15/11/2016 06:56:48     1
2  15/11/2016 06:59:40     2
3  15/11/2016 07:27:36     3
4  15/11/2016 07:29:10     4
5  15/11/2016 07:34:51     5
6  15/11/2016 07:35:10     6
7  15/11/2016 07:37:19     7
8  15/11/2016 07:39:55     8
9  15/11/2016 07:51:59     9
10 15/11/2016 08:00:13    10
11 15/11/2016 08:08:01    11
12 15/11/2016 08:13:21    12
13 15/11/2016 08:16:21    13
14 15/11/2016 12:14:48    14
15 15/11/2016 12:16:58    15
16 15/11/2016 12:51:22    16
17 15/11/2016 12:52:09    17
18 15/11/2016 13:26:29    18
19 15/11/2016 13:26:55    19
20 15/11/2016 13:34:14    20
21 15/11/2016 13:50:41    21
22 15/11/2016 13:53:25    22
23 15/11/2016 14:15:17    23
24 15/11/2016 14:54:49    24

问题:我如何将它们组合起来,以便对于单个事件,我们可以看到它们在哪些连续事件中发生,例如:

Ind       Date     Time Eventx   Eventy
1 FAU 15/11/2017 06:40:43     A   1
2 POR 15/11/2017 12:26:51     V   15
3 POR 15/11/2017 14:52:53     B   23

非常感谢

2 个答案:

答案 0 :(得分:2)

这应该有效(至少在您的示例中有效):

df1 <- structure(list(Ind = c("FAU", "POR", "POR", "MAM", "SUR", "SUR"
       ), Date = c("15/11/2016", "15/11/2016", "15/11/2016", "20/11/2016", 
       "03/12/2016", "14/12/2016"), Time = c("06:40:43", "12:26:51", 
       "14:52:53", "08:12:19", "13:51:18", "07:47:06"), Event = c("A", 
       "V", "B", "G", "A", "V")), .Names = c("Ind", "Date", "Time", 
       "Event"), class = "data.frame", row.names = c("1", "2", "3", 
       "4", "5", "6"))

df2 <- structure(list(Date = c("15/11/2016", "15/11/2016", "15/11/2016", 
       "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", 
       "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", 
       "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", 
       "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", 
       "15/11/2016"), Time = c("06:56:48", "06:59:40", "07:27:36", "07:29:10", 
       "07:34:51", "07:35:10", "07:37:19", "07:39:55", "07:51:59", "08:00:13", 
       "08:08:01", "08:13:21", "08:16:21", "12:14:48", "12:16:58", "12:51:22", 
       "12:52:09", "13:26:29", "13:26:55", "13:34:14", "13:50:41", "13:53:25", 
       "14:15:17", "14:54:49"), Event = 1:24), .Names = c("Date", "Time", 
       "Event"), class = "data.frame", row.names = c("1", "2", "3", 
       "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
       "16", "17", "18", "19", "20", "21", "22", "23", "24"))

创建as.POSIXct变量:

df1$datetime <- as.POSIXct(strptime(paste(df1$Date, df1$Time, sep = " "), "%d/%m/%Y %H:%M:%S"))
df2$datetime <- as.POSIXct(strptime(paste(df2$Date, df2$Time, sep = " "), "%d/%m/%Y %H:%M:%S"))

count启动新的df1变量:

df1$count <- NA

现在,我们遍历df1的行,并以相同的df2并在Date间隔内计算Time中的出现次数:

for(i in 1:nrow(df1)){
  df1$count[i] <- sum(df2$datetime[df2$Date == df1$Date[i]] < df1$datetime[i])
}

结果:

> df1
  Ind       Date     Time Event            datetime count
1 FAU 15/11/2016 06:40:43     A 2016-11-15 06:40:43     0
2 POR 15/11/2016 12:26:51     V 2016-11-15 12:26:51    15
3 POR 15/11/2016 14:52:53     B 2016-11-15 14:52:53    23
4 MAM 20/11/2016 08:12:19     G 2016-11-20 08:12:19     0
5 SUR 03/12/2016 13:51:18     A 2016-12-03 13:51:18     0
6 SUR 14/12/2016 07:47:06     V 2016-12-14 07:47:06     0

答案 1 :(得分:2)

我可以为您提供data.table解决方案。唯一的问题是,我必须将第二个数据帧中第一个事件的开始移到一个较早的日期,因为它是在第一个数据帧中第一个事件的开始时间之后。 您将需要其他软件包data.tablelubridate

library(data.table)
library(lubridate)
dt1 <- data.table(df1)
dt2 <- data.table(df2)

dt1[, Date.Time := as.POSIXct(strptime(paste(Date, Time, sep = " "), "%d/%m/%Y %H:%M:%S"))]
dt2[, Date.Time := as.POSIXct(strptime(paste(Date, Time, sep = " "), "%d/%m/%Y %H:%M:%S"))]


# Create the start and end time columns in the second data.table
dt2[, `:=`(Start.Time = Date.Time
        , End.Time = shift(Date.Time, n = 1L, fill = NA, type = "lead"))]

# Change the start date to an earlier one
dt2[Event == 1,`:=`(Start.Time = Start.Time - days(1)) ]

# Merge on multiple conditions and the selection of the relevant columns
dt2[dt1, on=.(Start.Time < Date.Time
              , End.Time > Date.Time)
              , nomatch = 0L][,.(Ind
                   , Date
                   , Time
                   , Eventx = i.Event
                   , Eventy = Event)]
# Output of the last merge
   Ind       Date     Time Eventx Eventy
1: FAU 15/11/2016 06:56:48      A      1
2: POR 15/11/2016 12:16:58      V     15
3: POR 15/11/2016 14:15:17      B     23