R:基于观察窗口的特定分钟数内的时间来子集数据帧

时间:2013-08-06 23:14:13

标签: r date time subset

假设我有一个包含开始和结束时间列的数据框,一个测量列和一个测量时间列,如下所示:

     start         end    value                time
   9:01:00     9:02:00     30.6  2013-03-25 9:05:00
   9:01:00     9:02:00     30.8  2013-03-25 9:15:00
   9:46:00     9:46:00     28.2  2013-03-25 9:43:00
   9:46:00     9:46:00     28.9  2013-03-25 9:53:00
  10:54:00    10:59:00     13.4 2013-03-25 10:56:00
  10:54:00    10:59:00     13.8 2013-03-25 11:56:00

此数据框的一个子集如何仅包含时间列在开始和结束时间之内的行,或者包括开始时间之前的十分钟和结束时间之后十分钟的行。我任意选择十分钟,并想知道在开始和结束时间之前和之后的任何时间内如何做到这一点。

结果数据框如下:

     start         end    value                time
   9:01:00     9:02:00     30.6  2013-03-25 9:05:00
   9:46:00     9:46:00     28.2  2013-03-25 9:43:00
   9:46:00     9:46:00     28.9  2013-03-25 9:53:00
  10:54:00    10:59:00     13.4 2013-03-25 10:56:00

除了从开始/结束列条目中减去/添加x分钟数,然后根据时间列是否落在这些扩展窗口之间进行子集化,还有其他办法吗?

目前,我已将时间列转换为POSIXlt格式。不幸的是,这给了今天的开始和结束时间日期。

这是第一个数据框的输入:

structure(list(start = structure(list(sec = c(0, 0, 0, 0, 0, 
0), min = c(1L, 1L, 46L, 46L, 54L, 54L), hour = c(9L, 9L, 9L, 
9L, 10L, 10L), mday = c(7L, 7L, 7L, 7L, 7L, 7L), mon = c(7L, 
7L, 7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 
218L, 218L, 218L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt")), end = structure(list(sec = c(0, 
0, 0, 0, 0, 0), min = c(2L, 2L, 46L, 46L, 59L, 59L), hour = c(9L, 
9L, 9L, 9L, 10L, 10L), mday = c(7L, 7L, 7L, 7L, 7L, 7L), mon = c(7L, 
7L, 7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 
218L, 218L, 218L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt")), value = c(30.6, 30.8, 28.2, 
28.9, 13.4, 13.8), time = structure(list(sec = c(0, 0, 0, 0, 
0, 0), min = c(5L, 15L, 43L, 53L, 56L, 56L), hour = c(9L, 9L, 
9L, 9L, 10L, 11L), mday = c(25L, 25L, 25L, 25L, 25L, 25L), mon = c(2L, 
2L, 2L, 2L, 2L, 2L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(1L, 1L, 1L, 1L, 1L, 1L), yday = c(83L, 83L, 83L, 
83L, 83L, 83L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"))), .Names = c("start", "end", 
"value", "time"), row.names = c(NA, -6L), class = "data.frame")

这是第二个数据帧的输入

structure(list(start = structure(list(sec = c(0, 0, 0, 0), min = c(1L, 
46L, 46L, 54L), hour = c(9L, 9L, 9L, 10L), mday = c(7L, 7L, 7L, 
7L), mon = c(7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 218L), 
    isdst = c(1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", 
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt", 
"POSIXt")), end = structure(list(sec = c(0, 0, 0, 0), min = c(2L, 
46L, 46L, 59L), hour = c(9L, 9L, 9L, 10L), mday = c(7L, 7L, 7L, 
7L), mon = c(7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 218L), 
    isdst = c(1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", 
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt", 
"POSIXt")), value = c(30.6, 28.2, 28.9, 13.4), time = structure(list(
    sec = c(0, 0, 0, 0), min = c(5L, 43L, 53L, 56L), hour = c(9L, 
    9L, 9L, 10L), mday = c(25L, 25L, 25L, 25L), mon = c(2L, 2L, 
    2L, 2L), year = c(113L, 113L, 113L, 113L), wday = c(1L, 1L, 
    1L, 1L), yday = c(83L, 83L, 83L, 83L), isdst = c(1L, 1L, 
    1L, 1L)), .Names = c("sec", "min", "hour", "mday", "mon", 
"year", "wday", "yday", "isdst"), class = c("POSIXlt", "POSIXt"
))), .Names = c("start", "end", "value", "time"), row.names = c(NA, 
-4L), class = "data.frame")

2 个答案:

答案 0 :(得分:2)

重新创建没有乐趣,但答案应该简单如下:

data[with(data, time > start - 10*60 & time < end + 10*60),]

这假设startendtime对象实际上都是可比较的(即相应的年份和日期) - 否则只需将对应于时间的子字符串转换为一个POSIX。

更新:好的,因为您的日期已关闭,您需要重新创建它们以“同步”,例如:

data$start <- as.POSIXct(substr(data$start,12,19), format="%H:%M:%S")
data$end <- as.POSIXct(substr(data$end,12,19), format="%H:%M:%S")
data$time <- as.POSIXct(substr(data$time,12,19), format="%H:%M:%S")

现在,上面的一行给出了你想要的东西。可能你应该小心如何从原始数据中对POSIX进行编码。此外,对于大多数应用程序,POSIXct可能比POSIXlt更受欢迎 - 其中每个元素都是一个列表。这可能会使线路上的某些操作变得紧张(或减慢)。

答案 1 :(得分:1)

以@ EliGurarie的答案为基础:

#dat <- ....see original question

将时间转换为POSIX表示并进行数学运算:

datestem <- as.character(as.Date(dat$time))
dat$start <- as.POSIXct(paste(datestem,format(dat$start,"%H:%M:%S")))
dat$end <- as.POSIXct(paste(datestem,format(dat$end,"%H:%M:%S")))

dat[
     with(
      dat,
      difftime(start,time,units="mins") > -10 &
      difftime(end,time,units="mins") < 10
     ),
   ]

或者,使用一些舍入和一些中间变量:

min10 <- 10/(60*24)
ds <- difftime(dat$start,dat$time,units="days")
ds <- dd - round(dd) 
de <- difftime(dat$end,dat$time,units="days")
de <- de - round(de) 

dat[ds > -min10 & de < min10,]