在尝试随时间循环时,我在R中遇到错误。这是我的数据框的子集(包含120000行)。
time value mean group
1 2017-01-01 12:00:00 0.507 0.5106533 NA
2 2017-01-01 12:05:00 0.526 0.5106533 NA
3 2017-01-01 12:10:00 0.489 0.5106533 NA
4 2017-01-01 12:15:00 0.598 0.5106533 NA
5 2017-01-01 12:20:00 0.564 0.5106533 NA
6 2017-01-01 12:25:00 0.536 0.5106533 NA
让我们说我想根据时间段创建组,并获得这样的预期结果:
time value mean group
1 2017-01-01 12:00:00 0.507 0.5106533 A
2 2017-01-01 12:05:00 0.526 0.5106533 A
3 2017-01-01 12:10:00 0.489 0.5106533 B
4 2017-01-01 12:15:00 0.598 0.5106533 B
5 2017-01-01 12:20:00 0.564 0.5106533 C
6 2017-01-01 12:25:00 0.536 0.5106533 C
我尝试了以下代码:
for (i in 1:length(merged.data$group)){
if (merged.data[as.POSIXlt(i)$time >= "2017-05-15 12:00:00 GMT" &
as.POSIXlt(i)$time <= "2017-05-29 12:00:00 GMT",]){
merged.data$group == "A"}
else if (merged.data[as.POSIXlt(i)$time >= "2017-08-11 12:00:00" &
as.POSIXlt(i)$time <= "2017-11-29 16:00:00",]){
merged.data$group == "B"}
else if (merged.data[as.POSIXlt(i)$time >= "2018-01-05 12:00:00" &
as.POSIXlt(i)$time <= "2018-02-16 16:00:00",]){
merged.data$group == "C"}
}
我收到以下错误:
Error in as.POSIXlt.numeric(i) : 'origin' must be supplied
我不明白,我认为POSIXlt摆脱了起源问题?虽然,我承认我对R中的时间问题的理解有些混乱,并且每次需要处理时间/日期时,我都会遇到一些困难的编码...
因此,我希望有人可以帮助我,不要犹豫告诉我我是否不清楚,或者是否需要更多/更好的信息来回答我的问题。
感谢您提前stackoverflowers!
答案 0 :(得分:0)
data.table方法...
样本数据
library( data.table )
dt <- fread("time value mean
2017-01-01T12:00:00 0.507 0.5106533
2017-01-01T12:05:00 0.526 0.5106533
2017-01-01T12:10:00 0.489 0.5106533
2017-01-01T12:15:00 0.598 0.5106533
2017-01-01T12:20:00 0.564 0.5106533
2017-01-01T12:25:00 0.536 0.5106533 ", header = TRUE)
dt[, time := as.POSIXct( time, format = "%Y-%m-%dT%H:%M:%S" )]
代码
library( data.table )
library( lubridate )
dt[, group := LETTERS[.GRP], by = lubridate::floor_date( time, "10 mins" ) ]
# time value mean group
# 1: 2017-01-01 12:00:00 0.507 0.5106533 A
# 2: 2017-01-01 12:05:00 0.526 0.5106533 A
# 3: 2017-01-01 12:10:00 0.489 0.5106533 B
# 4: 2017-01-01 12:15:00 0.598 0.5106533 B
# 5: 2017-01-01 12:20:00 0.564 0.5106533 C
# 6: 2017-01-01 12:25:00 0.536 0.5106533 C
根据提供的示例数据和代码,使用foverlaps
进行访问
library( data.table )
#create lookup-table with periods and group-names
periods.dt <- data.table(
start = as.POSIXct( c( "2017-05-15 12:00:00", "2017-08-11 12:00:00", "2018-01-05 12:00:00" ), tz = "GMT" ),
stop = as.POSIXct( c( "2017-08-11 12:00:00", "2018-01-05 12:00:00", "2018-02-16 16:00:00"), tz = "GMT" ),
group = LETTERS[1:3] )
#set keys
setkey( periods.dt, start, stop )
#create sample data
dt <- fread("time value mean
2017-01-01T12:00:00 0.507 0.5106533
2017-01-01T12:05:00 0.526 0.5106533
2017-01-01T12:10:00 0.489 0.5106533
2017-01-01T12:15:00 0.598 0.5106533
2017-01-01T12:20:00 0.564 0.5106533
2017-01-01T12:25:00 0.536 0.5106533 ", header = TRUE)
dt[, time := as.POSIXct( time, format = "%Y-%m-%dT%H:%M:%S", tz = "GMT" )]
#create dummies to join on
dt[, `:=`( start = time, stop = time )]
#perform overlap join, no match --> NA
foverlaps( dt, periods.dt, type = "within", nomatch = NA)[, c("time", "value","mean","group"), with = FALSE]
# time value mean group
# 1: 2017-01-01 12:00:00 0.507 0.5106533 <NA>
# 2: 2017-01-01 12:05:00 0.526 0.5106533 <NA>
# 3: 2017-01-01 12:10:00 0.489 0.5106533 <NA>
# 4: 2017-01-01 12:15:00 0.598 0.5106533 <NA>
# 5: 2017-01-01 12:20:00 0.564 0.5106533 <NA>
# 6: 2017-01-01 12:25:00 0.536 0.5106533 <NA>
答案 1 :(得分:0)
感谢您的回答,我发现只有日期对我有用,因为我的数据集有很大的空白。通过简单的ifelse,我发现了一些可行的方法:
merged.data $ group <-ifelse(merged.data $ date> =“ 2017-05-15”&merged.data $ date <=“ 2017-05-29”,1, ifelse(merged.data $ date> =“ 2017-08-11”&merged.data $ date <=“ 2017-11-29”,2, ifelse(merged.data $ date> =“ 2018-01-05”&merged.data $ date <=“ 2018-02-16”,3, 不适用 )))
这不适用于我拥有的POSIXlt对象,但是Wimpel提供的解决方案似乎可行(我在使用data.table时遇到问题,但这又是另一回事了!)
再次感谢您,这个论坛确实有很大帮助!