基于R中的多个时间段过滤/子集数据帧

时间:2018-01-14 03:09:24

标签: r filter subset

我需要使用多个时间段过滤或子集大型(100,000多行)数据帧。我基本上需要删除数据框中的行,这些行在时间段表中的开始日期/时间和结束日期/时间之间具有日期/时间。我发现这篇文章描述了我的问题(Subsetting data by multiple date ranges - R),但我似乎无法让代码工作。任何建议表示赞赏。请参阅下面的示例数据。

time period table:
start,end
7/26/2017 14:05,7/26/2017 16:05
8/24/2017 13:40,8/24/2017 15:40
6/29/2017 20:45,6/30/2017 0:41

dataframe:
time,temp.c,lux,serial.num
6/29/2017 20:40,33.63,0,20168779
6/29/2017 20:40,33.11,0,20168780
6/29/2017 20:50,20.42,602.8,20148333
6/29/2017 20:50,20.32,721.2,20148334
6/29/2017 20:50,19.75,3788.9,20148335
7/26/2017 16:00,22.9,183,20168779
7/26/2017 16:00,23.29,1237.9,20168780
7/26/2017 16:10,23.38,1173.3,20148333
7/26/2017 16:10,23.67,839.6,20148334
8/24/2017 15:40,24.06,387.5,20168780
8/24/2017 15:50,23.58,0,20148332

1 个答案:

答案 0 :(得分:0)

数据:

time_period_table <- data.frame(start=c('7/26/2017 14:05', '8/24/2017 13:40', '6/29/2017 20:45'), end=c('7/26/2017 16:05', '8/24/2017 15:40', '6/30/2017 0:41'))
time_period_table $start <- as.POSIXct(time_period_table $start, format="%m/%d/%Y %H:%M")
time_period_table $end <- as.POSIXct(time_period_table $end, format="%m/%d/%Y %H:%M")

time_period_table

enter image description here

保持数据框中的行与开始日期/时间和结束日期/时间之间的日期/时间:

keep_by_date_time <- function(data_frame, start_date, start_time, end_date, end_time) {
    from <- paste(start_date, start_time)
    to <- paste(end_date, end_time)
    sub_frame <- subset(data_frame, start >= as.POSIXct(from) & end <= as.POSIXct(to))
    return(sub_frame)
}

<强>用法:

# return rows with date and time between July 26, 2017 at 1:05pm and July 26, 2017 at 5:05pm:

keep_by_date_time(time_period_table, '2017-07-26', '13:05', '2017-07-26', '17:05')

enter image description here

删除数据框中的行,其中包含开始日期/时间和结束日期/时间之间的日期/时间:

remove_by_date_time <- function(data_frame, start_date, start_time, end_date, end_time) {
    from <- paste(start_date, start_time)
    to <- paste(end_date, end_time)
    rem_row <- which(time_period_table$start >= as.POSIXct(from) & time_period_table$end <= as.POSIXct(to))
    rem_frame <- data_frame[-rem_row,]
    return(rem_frame)
}

<强>用法:

# remove rows with date and time between August 20, 2017 at 1:05pm and August 26, 2017 at 5:05pm:

remove_by_date_time(time_period_table, '2017-08-20', '13:05', '2017-08-26', '17:05')

enter image description here