我有一个数据系列,其数据如下所示:
2020-01-02 09:30:00 1 gdss
2020-01-02 10:00:00 2 jojo
2020-01-02 10:30:00 3 hutr
2020-01-02 11:00:00 2 uff
2020-01-02 11:30:00 4 wwe
2020-01-02 12:00:00 1 vev
2020-01-02 12:30:00 2 wow
它具有更多列,但并不重要。但是,整个集合的确有十多年的30分钟数据。
我想每天过滤某些小时,但无法正确处理。我正在使用lubridate
例如,要获得此间隔:
2020-01-02 10:30:00 3 hutr
2020-01-02 11:00:00 2 uff
2020-01-02 11:30:00 4 wwe
2020-01-02 12:00:00 1 vev
我尝试了以下操作:
with(load_dataset, load_dataset[ (hour(load_dataset$Date) == 10 & minute(load_dataset$Date) == 30) | (hour(load_dataset$Date) == 12 & minute(load_dataset$Date) < 30), ])
这仅给出第一个和最后一个。
with(load_dataset, load_dataset[(hour(load_dataset$Date) == 10 & minute(load_dataset$Date) == 30) & (hour(load_dataset$Date) == 12 & minute(load_dataset$Date) < 30), ])
这给出了零行。
with(load_dataset, load_dataset[(hour(load_dataset$Date) >= 10 & minute(load_dataset$Date) == 30) & (hour(load_dataset$Date) <= 12 & minute(load_dataset$Date) <= 30), ])
这仅给出30分钟的间隔:
2020-01-02 10:30:00 3 hutr
2020-01-02 11:30:00 4 wwe
如何在10:30到12:00(包括12:00)之间每天过滤数据集中的每一行?
答案 0 :(得分:2)
您可以强迫时间到"numeric"
,然后查看时间是否在1030:1200
之内。
load_dataset[as.numeric(strftime(load_dataset$date, "%H%M")) %in% 1030:1200, ]
# date V3 V4
# 3 2020-01-02 10:30:00 3 hutr
# 4 2020-01-02 11:00:00 2 uff
# 5 2020-01-02 11:30:00 4 wwe
# 6 2020-01-02 12:00:00 1 vev
注意:此解决方案假定"POSIXct"
列的格式为date
;如果还不是这样,请在此之前使用它:
load_dataset$date <- as.POSIXct(load_dataset$date)
该原理也适用于“实时”时间序列对象,例如"xts"
。
load_dataset.xts[
as.numeric(strftime(as.POSIXct(attr(load_dataset.xts, "index"),
origin="1970-01-01"), "%H%M")) %in% 1030:1200, ]
# V3 V4
# 2020-01-02 10:30:00 "3" "hutr"
# 2020-01-02 11:00:00 "2" "uff"
# 2020-01-02 11:30:00 "4" "wwe"
# 2020-01-02 12:00:00 "1" "vev"
数据:
load_dataset <- structure(list(date = structure(c(1577953800, 1577955600, 1577957400,
1577959200, 1577961000, 1577962800, 1577964600), class = c("POSIXct",
"POSIXt"), tzone = ""), V3 = c(1L, 2L, 3L, 2L, 4L, 1L, 2L), V4 = c("gdss",
"jojo", "hutr", "uff", "wwe", "vev", "wow")), row.names = c(NA,
-7L), class = "data.frame")
load_dataset.xts <- structure(c("1", "2", "3", "2", "4", "1", "2", "gdss", "jojo",
"hutr", "uff", "wwe", "vev", "wow"), .Dim = c(7L, 2L), .Dimnames = list(
NULL, c("V3", "V4")), index = structure(c(1577953800, 1577955600,
1577957400, 1577959200, 1577961000, 1577962800, 1577964600), tzone = "", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"))
答案 1 :(得分:1)
我认为您想做的是:
subset(transform(df, hour = as.integer(format(datetime, "%H")),
minute = as.integer(format(datetime, "%M"))),
(hour == 10 & minute >= 30) | hour == 11 | hour == 12 & minute == 0)
# V3 V4 datetime hour minute
#3 3 hutr 2020-01-02 10:30:00 10 30
#4 2 uff 2020-01-02 11:00:00 11 0
#5 4 wwe 2020-01-02 11:30:00 11 30
#6 1 vev 2020-01-02 12:00:00 12 0
使用dplyr
和lubridate
可以通过以下方式完成:
library(dplyr)
library(lubridate)
df %>%
mutate(hour = hour(datetime), minute = minute(datetime)) %>%
filter((hour == 10 & minute >= 30) | hour == 11 | hour == 12 & minute == 0)
数据
df <- structure(list(V3 = c(1L, 2L, 3L, 2L, 4L, 1L, 2L), V4 = structure(c(1L,
3L, 2L, 4L, 7L, 5L, 6L), .Label = c("gdss", "hutr", "jojo", "uff",
"vev", "wow", "wwe"), class = "factor"), datetime = structure(c(1577957400,
1577959200, 1577961000, 1577962800, 1577964600, 1577966400, 1577968200
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-7L), class = "data.frame")