将日期/时间顺序与数据进行比较以填充新的DF

时间:2015-12-11 12:04:50

标签: r dataframe

所以我有一些观察数据,包括每次观察记录的时间和日期。它基本上是一个不完整的日期和时间戳列表。

例如: (数据组成)

data=as.POSIXlt(c("2014-10-24 11:09",
"2014-10-24 11:32",
"2014-10-24 11:34",
"2014-10-24 14:09",
"2014-10-24 14:32",
"2014-10-24 14:34",
"2014-10-24 20:09",
"2014-10-24 21:32",
"2014-10-24 21:34",
"2014-10-24 23:01",
"2014-10-24 23:05",
"2014-10-24 23:58",
"2014-10-25 02:13",
"2014-10-25 02:32",
"2014-10-25 05:26",
"2014-10-25 05:46",
"2014-10-25 18:39",
"2014-10-25 22:49",
"2014-10-25 22:55",
"2014-10-26 01:43",
"2014-10-26 01:56",
"2014-10-26 09:15",
"2014-10-26 10:17",
"2014-10-26 10:34",
"2014-10-26 10:36",
"2014-10-26 11:32",
"2014-10-26 14:05",
"2014-10-26 14:09",
"2014-10-26 17:01",
"2014-10-26 20:41"))

我制作了另一个序列,其中包括整个学习期(在本例中为2014-10-20 00:00至2014-10-27 23:59),时间步长为30分钟:

start = "2014-10-20 00:00"
end = "2014-10-27 23:59" 
timestep = 1800 #1800 sec = 30 min
timeseq = seq(from = as.POSIXlt(start), to = as.POSIXlt(end), by = timestep)

现在我想要一个带有序列的新数据框和另一列,在这30分钟内发生'数据'观察量。

结果看起来像是:

2014-10-20 00:00    0
2014-10-20 00:30    0
....
2014-10-24 11:00    1
2014-10-24 11:30    2
....
2014-10-26 14:00    2

希望这是有道理的!

编辑:原始数据超过75.000条记录,但前三天的情况如下:

Date and Time (UTC)
29-09-14 11:11
29-09-14 11:11
20-10-14 16:43
20-10-14 16:43
20-10-14 16:44
20-10-14 17:16
20-10-14 17:16
20-10-14 17:16
20-10-14 17:16
20-10-14 17:16
24-10-14 14:47
24-10-14 14:52
24-10-14 14:56
24-10-14 14:58
24-10-14 15:39
24-10-14 16:03
24-10-14 16:19
24-10-14 16:43
24-10-14 16:44
24-10-14 16:55
24-10-14 16:58
24-10-14 18:12
24-10-14 18:29
24-10-14 18:42
24-10-14 18:43
24-10-14 19:49
24-10-14 20:03
24-10-14 20:08
24-10-14 21:24
24-10-14 21:25
24-10-14 21:34
24-10-14 21:35
24-10-14 21:45
24-10-14 21:55
24-10-14 21:57
24-10-14 22:01
24-10-14 22:02
24-10-14 22:07
24-10-14 22:08
24-10-14 22:09
24-10-14 22:15
24-10-14 22:16
24-10-14 22:18
24-10-14 22:23
24-10-14 22:33
24-10-14 22:34
24-10-14 22:40
24-10-14 22:41
25-10-14 07:54
25-10-14 07:57
25-10-14 07:58
25-10-14 08:05
25-10-14 08:07
25-10-14 08:08
25-10-14 08:21
25-10-14 08:26
25-10-14 11:33
25-10-14 11:35
25-10-14 11:45
25-10-14 11:56
25-10-14 12:01
25-10-14 12:07
25-10-14 12:08
25-10-14 12:11
25-10-14 12:13
25-10-14 12:15
25-10-14 12:17
25-10-14 12:18
25-10-14 12:24
25-10-14 12:32
25-10-14 12:43
25-10-14 12:50
25-10-14 12:52
25-10-14 12:53
25-10-14 12:56
25-10-14 12:58
25-10-14 13:07
25-10-14 13:08
25-10-14 13:10
25-10-14 13:26
25-10-14 13:28
25-10-14 13:30
25-10-14 13:32
25-10-14 13:35
25-10-14 13:36
25-10-14 13:41
25-10-14 13:54
25-10-14 14:13
25-10-14 14:30
25-10-14 14:32
25-10-14 14:33
25-10-14 14:34
25-10-14 14:35
25-10-14 14:37
25-10-14 14:54
25-10-14 14:57
25-10-14 15:00
25-10-14 15:45
25-10-14 15:49
25-10-14 15:54
25-10-14 15:59
25-10-14 16:02
25-10-14 16:04
25-10-14 16:06
25-10-14 16:10
25-10-14 16:11
25-10-14 16:14
25-10-14 16:16
25-10-14 16:20
25-10-14 16:22
25-10-14 16:24
25-10-14 16:25
25-10-14 16:26
25-10-14 16:28
25-10-14 16:29
25-10-14 16:31
25-10-14 16:33
25-10-14 16:34
25-10-14 16:35
25-10-14 16:37
25-10-14 16:38
25-10-14 16:41
25-10-14 16:42
25-10-14 16:43
25-10-14 16:44
25-10-14 16:46
25-10-14 16:48
25-10-14 16:52
25-10-14 16:54
25-10-14 16:56
25-10-14 16:57
25-10-14 16:59
25-10-14 17:01
25-10-14 17:03
25-10-14 17:04
25-10-14 17:08
25-10-14 17:24
25-10-14 17:25
25-10-14 17:27
25-10-14 17:29
25-10-14 17:34
25-10-14 17:35
25-10-14 17:36
25-10-14 17:37
25-10-14 17:41
25-10-14 17:46
25-10-14 17:51
25-10-14 17:58
25-10-14 18:00
25-10-14 18:01
25-10-14 18:03
25-10-14 18:04
25-10-14 18:13
25-10-14 18:15
25-10-14 18:16
25-10-14 18:18
25-10-14 18:19
25-10-14 18:34
25-10-14 18:41
25-10-14 18:42
25-10-14 18:43
25-10-14 18:44
25-10-14 19:00
25-10-14 19:03
25-10-14 19:08
25-10-14 19:09
25-10-14 19:11
25-10-14 19:12
25-10-14 19:14
25-10-14 19:15
25-10-14 19:30
25-10-14 19:32
25-10-14 19:38
25-10-14 19:41
25-10-14 19:54
25-10-14 20:00
25-10-14 20:01
25-10-14 20:08
25-10-14 20:15
25-10-14 20:18
25-10-14 20:19
25-10-14 20:22
25-10-14 20:29
25-10-14 20:43
25-10-14 21:02
25-10-14 21:06
25-10-14 21:11
25-10-14 21:19
25-10-14 21:22
25-10-14 21:24
25-10-14 21:26
25-10-14 21:28
25-10-14 21:29
25-10-14 21:31
25-10-14 21:34
25-10-14 21:45
25-10-14 21:47
25-10-14 21:48
25-10-14 21:50
25-10-14 21:51
25-10-14 21:53
25-10-14 21:54
25-10-14 21:54
25-10-14 21:55
25-10-14 21:56
25-10-14 21:58
25-10-14 22:03
25-10-14 22:06
25-10-14 22:11
25-10-14 22:13
25-10-14 22:14
25-10-14 22:16
25-10-14 22:20
25-10-14 22:24
25-10-14 22:27
25-10-14 22:29
25-10-14 22:31
25-10-14 22:34
25-10-14 22:36
25-10-14 22:37
25-10-14 22:39
25-10-14 22:42
25-10-14 22:43
25-10-14 22:45
25-10-14 22:46
25-10-14 22:47
25-10-14 22:49
25-10-14 22:51
25-10-14 22:53
25-10-14 22:55
25-10-14 22:56
25-10-14 22:58
25-10-14 23:02
25-10-14 23:06
25-10-14 23:09
25-10-14 23:11
25-10-14 23:13
25-10-14 23:14
25-10-14 23:17
25-10-14 23:19
25-10-14 23:20
25-10-14 23:22
25-10-14 23:24
25-10-14 23:28
25-10-14 23:30
25-10-14 23:33
25-10-14 23:36
25-10-14 23:37
25-10-14 23:39
25-10-14 23:40
25-10-14 23:41
25-10-14 23:43
25-10-14 23:44
25-10-14 23:48
25-10-14 23:54
25-10-14 23:57
25-10-14 23:59
26-10-14 00:01
26-10-14 00:02
26-10-14 00:03
26-10-14 00:07
26-10-14 00:09
26-10-14 00:12
26-10-14 00:14
26-10-14 00:22
26-10-14 00:23
26-10-14 00:26
26-10-14 00:29
26-10-14 00:31
26-10-14 00:34
26-10-14 00:35
26-10-14 00:38
26-10-14 00:43
26-10-14 00:48
26-10-14 00:50
26-10-14 00:59
26-10-14 01:00
26-10-14 01:03
26-10-14 01:04
26-10-14 01:07
26-10-14 01:13
26-10-14 01:25
26-10-14 01:37
26-10-14 01:46
26-10-14 01:54
26-10-14 02:02
26-10-14 02:05
26-10-14 02:06
26-10-14 02:07
26-10-14 02:10
26-10-14 02:12
26-10-14 02:14
26-10-14 02:16
26-10-14 02:21
26-10-14 02:22
26-10-14 02:25
26-10-14 02:26
26-10-14 02:31
26-10-14 02:36
26-10-14 02:39
26-10-14 02:40
26-10-14 02:42
26-10-14 02:45
26-10-14 02:48
26-10-14 02:52
26-10-14 02:53
26-10-14 02:54
26-10-14 02:55
26-10-14 02:57
26-10-14 02:58
26-10-14 03:00
26-10-14 03:03
26-10-14 03:05
26-10-14 03:08
26-10-14 03:12
26-10-14 03:14
26-10-14 03:15
26-10-14 03:16
26-10-14 03:18
26-10-14 03:23
26-10-14 03:25
26-10-14 03:26
26-10-14 03:27
26-10-14 03:29
26-10-14 03:31
26-10-14 03:32
26-10-14 03:35
26-10-14 03:37
26-10-14 03:38
26-10-14 03:40
26-10-14 03:41
26-10-14 03:43
26-10-14 03:46
26-10-14 03:48
26-10-14 03:49
26-10-14 03:50
26-10-14 03:55
26-10-14 03:57
26-10-14 04:03
26-10-14 04:12
26-10-14 04:14
26-10-14 04:16
26-10-14 04:22
26-10-14 04:25
26-10-14 04:26
26-10-14 04:28
26-10-14 04:29
26-10-14 04:31
26-10-14 04:39
26-10-14 04:41
26-10-14 04:46
26-10-14 04:58
26-10-14 05:03
26-10-14 05:05
26-10-14 05:08
26-10-14 05:18
26-10-14 05:19
26-10-14 05:20
26-10-14 05:21
26-10-14 05:26
26-10-14 05:27
26-10-14 05:28
26-10-14 05:29
26-10-14 05:31
26-10-14 05:36
26-10-14 05:38
26-10-14 05:39
26-10-14 05:41
26-10-14 05:42
26-10-14 05:45
26-10-14 05:47
26-10-14 05:50
26-10-14 05:53
26-10-14 05:57
26-10-14 06:01
26-10-14 06:02
26-10-14 06:03
26-10-14 06:04
26-10-14 06:06
26-10-14 06:08
26-10-14 06:09
26-10-14 06:11
26-10-14 06:13
26-10-14 06:15
26-10-14 06:16
26-10-14 06:17
26-10-14 06:21
26-10-14 06:22
26-10-14 06:24
26-10-14 06:28
26-10-14 06:32
26-10-14 06:34
26-10-14 06:36
26-10-14 06:37
26-10-14 06:38

这是在上面提到的'data'DF中,但在实际代码中它被称为'receiver'。其余的都是一样的。 我如何使用下面答案中的代码。

timestep = 1800 #sec
start = "2014-10-21 00:00"
end = "2015-10-21 23:59"
#this is the DF that contains two columns needed: date and time
receiver = R125926

timeseq = seq(from = as.POSIXct(start), to = as.POSIXct(end), by = timestep)

#The origal date and time where in two different columns, so I have to combine them back into one.
receiver$date2 = as.POSIXct(paste(receiver$date, receiver$time), format="%Y-%m-%d %H:%M:%S")
#isolate the date/time data and put it in a new DF (since the rest from the receiver data doesn't have to be used). This is how the data snippet above looks like.
dateseq = receiver$date2

dt.timeseq = data.table(timeseq)
dtreceiver = data.table(dateseq)[, data := dateseq - as.numeric(dateseq) %% (3600/(timestep))][,list(count = .N), by=dateseq]

setkey(dt.timeseq, timeseq)
setkey(dtreceiver, dateseq)

new_dt = dtreceiver[dt.timeseq]
new_dt[is.na(count), count := 0]

1 个答案:

答案 0 :(得分:2)

这样的事可能会有所帮助:

library(data.table)
#convert to data.tables
dt <- data.table(timeseq)
#floor the times down to 30min chunks and count occurances
dtdata <- data.table(data)[, data := data - as.numeric(data) %% (3600/(60/30))][,
                           list(count = .N), by=data]

#set the correct keys
setkey(dt, timeseq)
setkey(dtdata, data)

#merge and set NAs (unmatched) to zero
new_dt <- dtdata[dt]
new_dt[is.na(count), count := 0]

输出:

> new_dt
                    data count
  1: 2014-10-20 00:00:00     0
  2: 2014-10-20 00:30:00     0
  3: 2014-10-20 01:00:00     0
  4: 2014-10-20 01:30:00     0
  5: 2014-10-20 02:00:00     0
 ---                          
390: 2014-10-27 21:30:00     0
391: 2014-10-27 22:00:00     0
392: 2014-10-27 22:30:00     0
393: 2014-10-27 23:00:00     0
394: 2014-10-27 23:30:00     0

这只是一些行,需要填充的行被填入。