我有一个日期矢量,例如。
dates <- c('2013-01-01', '2013-04-02', '2013-06-10', '2013-09-30')
包含日期列的数据框,例如
df <- data.frame(
'date' = c('2013-01-04', '2013-01-22', '2013-10-01', '2013-10-10'),
'a' = c(1,2,3,4),
'b' = c('a', 'b', 'c', 'd')
)
我希望对数据框进行子集化,使其仅包含“日期”向量中任何日期之后日期少于5天的行。
即。初始数据框看起来像这样
date a b
2013-01-04 1 a
2013-01-22 2 b
2013-10-01 3 c
2013-10-10 4 d
查询后我只剩下第一排和第三排(自2013-01-04起2013年1月1日起5天内,2013-10-01在2013-09-30的5天内) )
有谁知道最好的方法吗?
提前致谢
答案 0 :(得分:5)
这与data.table
掷骰很容易(也很快):
library(data.table)
dt = data.table(df)
# convert to Date (or IDate) to have numbers instead of strings for dates
# also set the key for dates for the join
dt[, date := as.Date(date)]
dates = data.table(date = as.Date(dates), key = 'date')
# join with a roll of 5 days, throwing out dates that don't match
dates[dt, roll = 5, nomatch = 0]
# date a b
#1: 2013-01-04 1 a
#2: 2013-10-01 3 c
答案 1 :(得分:4)
# Rows Selected: Iterate over each row in the DF,
# and check if its `date` value is within 5 from any value in the `dates` vector
rows <- sapply(df$date, function(x) any( abs(x-dates) <= 5))
# Use that result to subset your data.frame
df[rows, ]
# date a b
# 1 2013-01-04 1 a
# 3 2013-10-01 3 c
重要的是,确保您的日期值是实际的Date
而不是character
看起来像日期
dates <- as.Date(dates)
df$date <- as.Date(df$date)
答案 2 :(得分:0)
首先确保df$date
属于班级日期。然后:
df[df$date %in% sapply(dates, function(x) x:(x+5)),]
date a b
1 2013-01-04 1 a
3 2013-10-01 3 c
出于某种原因,我觉得这可能是一个更合适的方法:
df[df$date %in% mapply(`:`, from=dates, to=dates+5),]