我需要使用多个时间段过滤或子集大型(100,000多行)数据帧。我基本上需要删除数据框中的行,这些行在时间段表中的开始日期/时间和结束日期/时间之间具有日期/时间。我发现这篇文章描述了我的问题(Subsetting data by multiple date ranges - R),但我似乎无法让代码工作。任何建议表示赞赏。请参阅下面的示例数据。
time period table:
start,end
7/26/2017 14:05,7/26/2017 16:05
8/24/2017 13:40,8/24/2017 15:40
6/29/2017 20:45,6/30/2017 0:41
dataframe:
time,temp.c,lux,serial.num
6/29/2017 20:40,33.63,0,20168779
6/29/2017 20:40,33.11,0,20168780
6/29/2017 20:50,20.42,602.8,20148333
6/29/2017 20:50,20.32,721.2,20148334
6/29/2017 20:50,19.75,3788.9,20148335
7/26/2017 16:00,22.9,183,20168779
7/26/2017 16:00,23.29,1237.9,20168780
7/26/2017 16:10,23.38,1173.3,20148333
7/26/2017 16:10,23.67,839.6,20148334
8/24/2017 15:40,24.06,387.5,20168780
8/24/2017 15:50,23.58,0,20148332
答案 0 :(得分:0)
数据:强>
time_period_table <- data.frame(start=c('7/26/2017 14:05', '8/24/2017 13:40', '6/29/2017 20:45'), end=c('7/26/2017 16:05', '8/24/2017 15:40', '6/30/2017 0:41'))
time_period_table $start <- as.POSIXct(time_period_table $start, format="%m/%d/%Y %H:%M")
time_period_table $end <- as.POSIXct(time_period_table $end, format="%m/%d/%Y %H:%M")
time_period_table
保持数据框中的行与开始日期/时间和结束日期/时间之间的日期/时间:
keep_by_date_time <- function(data_frame, start_date, start_time, end_date, end_time) {
from <- paste(start_date, start_time)
to <- paste(end_date, end_time)
sub_frame <- subset(data_frame, start >= as.POSIXct(from) & end <= as.POSIXct(to))
return(sub_frame)
}
<强>用法:强>
# return rows with date and time between July 26, 2017 at 1:05pm and July 26, 2017 at 5:05pm:
keep_by_date_time(time_period_table, '2017-07-26', '13:05', '2017-07-26', '17:05')
删除数据框中的行,其中包含开始日期/时间和结束日期/时间之间的日期/时间:
remove_by_date_time <- function(data_frame, start_date, start_time, end_date, end_time) {
from <- paste(start_date, start_time)
to <- paste(end_date, end_time)
rem_row <- which(time_period_table$start >= as.POSIXct(from) & time_period_table$end <= as.POSIXct(to))
rem_frame <- data_frame[-rem_row,]
return(rem_frame)
}
<强>用法:强>
# remove rows with date and time between August 20, 2017 at 1:05pm and August 26, 2017 at 5:05pm:
remove_by_date_time(time_period_table, '2017-08-20', '13:05', '2017-08-26', '17:05')