按小时过滤时间序列

时间:2020-04-24 11:06:50

标签: r filter time-series lubridate

我有一个数据系列,其数据如下所示:

2020-01-02 09:30:00 1 gdss
2020-01-02 10:00:00 2 jojo
2020-01-02 10:30:00 3 hutr 
2020-01-02 11:00:00 2 uff
2020-01-02 11:30:00 4 wwe
2020-01-02 12:00:00 1 vev
2020-01-02 12:30:00 2 wow

它具有更多列,但并不重要。但是,整个集合的确有十多年的30分钟数据。

我想每天过滤某些小时,但无法正确处理。我正在使用lubridate

例如,要获得此间隔:

2020-01-02 10:30:00 3 hutr 
2020-01-02 11:00:00 2 uff
2020-01-02 11:30:00 4 wwe
2020-01-02 12:00:00 1 vev

我尝试了以下操作:

with(load_dataset, load_dataset[ (hour(load_dataset$Date) == 10 & minute(load_dataset$Date) == 30) | (hour(load_dataset$Date) == 12 & minute(load_dataset$Date) < 30), ])

这仅给出第一个和最后一个。

with(load_dataset, load_dataset[(hour(load_dataset$Date) == 10 & minute(load_dataset$Date) == 30) & (hour(load_dataset$Date) == 12 & minute(load_dataset$Date) < 30), ])

这给出了零行。

with(load_dataset, load_dataset[(hour(load_dataset$Date) >= 10 & minute(load_dataset$Date) == 30) & (hour(load_dataset$Date) <= 12 & minute(load_dataset$Date) <= 30), ])

这仅给出30分钟的间隔:

2020-01-02 10:30:00 3 hutr
2020-01-02 11:30:00 4 wwe

如何在10:30到12:00(包括12:00)之间每天过滤数据集中的每一行?

2 个答案:

答案 0 :(得分:2)

您可以强迫时间到"numeric",然后查看时间是否在1030:1200之内。

load_dataset[as.numeric(strftime(load_dataset$date, "%H%M")) %in% 1030:1200, ]
#                  date V3   V4
# 3 2020-01-02 10:30:00  3 hutr
# 4 2020-01-02 11:00:00  2  uff
# 5 2020-01-02 11:30:00  4  wwe
# 6 2020-01-02 12:00:00  1  vev

注意:此解决方案假定"POSIXct"列的格式为date;如果还不是这样,请在此之前使用它:

load_dataset$date <- as.POSIXct(load_dataset$date)

该原理也适用于“实时”时间序列对象,例如"xts"

load_dataset.xts[
  as.numeric(strftime(as.POSIXct(attr(load_dataset.xts, "index"), 
                                 origin="1970-01-01"), "%H%M")) %in% 1030:1200, ]
#                     V3  V4    
# 2020-01-02 10:30:00 "3" "hutr"
# 2020-01-02 11:00:00 "2" "uff" 
# 2020-01-02 11:30:00 "4" "wwe" 
# 2020-01-02 12:00:00 "1" "vev" 

数据:

load_dataset <- structure(list(date = structure(c(1577953800, 1577955600, 1577957400, 
1577959200, 1577961000, 1577962800, 1577964600), class = c("POSIXct", 
"POSIXt"), tzone = ""), V3 = c(1L, 2L, 3L, 2L, 4L, 1L, 2L), V4 = c("gdss", 
"jojo", "hutr", "uff", "wwe", "vev", "wow")), row.names = c(NA, 
-7L), class = "data.frame")

load_dataset.xts <- structure(c("1", "2", "3", "2", "4", "1", "2", "gdss", "jojo", 
"hutr", "uff", "wwe", "vev", "wow"), .Dim = c(7L, 2L), .Dimnames = list(
    NULL, c("V3", "V4")), index = structure(c(1577953800, 1577955600, 
1577957400, 1577959200, 1577961000, 1577962800, 1577964600), tzone = "", tclass = c("POSIXct", 
"POSIXt")), class = c("xts", "zoo"))

答案 1 :(得分:1)

我认为您想做的是:

subset(transform(df, hour = as.integer(format(datetime, "%H")), 
                     minute = as.integer(format(datetime, "%M"))), 
      (hour == 10 & minute >= 30) | hour == 11 | hour == 12 & minute == 0)


#  V3   V4            datetime hour minute
#3  3 hutr 2020-01-02 10:30:00   10     30
#4  2  uff 2020-01-02 11:00:00   11      0
#5  4  wwe 2020-01-02 11:30:00   11     30
#6  1  vev 2020-01-02 12:00:00   12      0

使用dplyrlubridate可以通过以下方式完成:

library(dplyr)
library(lubridate)

df %>%
  mutate(hour = hour(datetime), minute = minute(datetime)) %>%
  filter((hour == 10 & minute >= 30) | hour == 11 | hour == 12 & minute == 0)

数据

df <-  structure(list(V3 = c(1L, 2L, 3L, 2L, 4L, 1L, 2L), V4 = structure(c(1L, 
3L, 2L, 4L, 7L, 5L, 6L), .Label = c("gdss", "hutr", "jojo", "uff", 
"vev", "wow", "wwe"), class = "factor"), datetime = structure(c(1577957400, 
1577959200, 1577961000, 1577962800, 1577964600, 1577966400, 1577968200
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, 
-7L), class = "data.frame")