首先,我将所有假期日期分别存储,并希望通过执行以下操作来确定数据集中哪些日期是假期:
publicHolidays <- as.Date(c("2019-01-01", "2019-01-15", "2019-01-26", "2019-03-04", "2019-03-21", "2019-04-06"))
sampledata <- data.frame(
sid = c (1:5),
DOJ = c("21/03/2019", "26/1/2019", "1/03/2019", "12/03/2019", "1/1/2019"),
stringsAsFactors = FALSE
)
sampledata$isholiday <- as.numeric(as.Date(sampledata$DOJ,'%d/%m/%Y') %in% publicHolidays)
#sampledata$isholiday
str(sampledata)
接下来,我想为每个日期查找到最近的假期有多少天(向前或向后)。我该怎么办?
答案 0 :(得分:3)
比排序全天和所有假期更有效的是利用排序;这是通过data.table
的滚动联接完成的:
library(data.table)
setDT(sampledata)
sampledata[ , DOJ := as.IDate(DOJ, '%d/%m/%Y')]
setkey(sampledata, DOJ)
holidays = data.table(date = as.IDate(publicHolidays))
holidays[ , I := .I]
setkey(holidays)
sampledata[ , nearest_holiday := {
idx = holidays[copy(.SD), I, roll = 'nearest']
holidays$date[idx]
}]
sampledata[]
# sid DOJ nearest_holiday
# 1: 5 2019-01-01 2019-01-01
# 2: 2 2019-01-26 2019-01-26
# 3: 3 2019-03-01 2019-03-04
# 4: 4 2019-03-12 2019-03-04
# 5: 1 2019-03-21 2019-03-21
有了这个,很容易计算距离:
sampledata[ , days_to_nearest := nearest_holiday - DOJ][]
# sid DOJ nearest_holiday days_to_nearest
# 1: 5 2019-01-01 2019-01-01 0
# 2: 2 2019-01-26 2019-01-26 0
# 3: 3 2019-03-01 2019-03-04 3
# 4: 4 2019-03-12 2019-03-04 -8
# 5: 1 2019-03-21 2019-03-21 0
答案 1 :(得分:2)
使用sapply
的基本R方法是检查min
和DOJ
之间的publicHolidays
绝对绝对值
sampledata$nearest_holiday <- sapply(as.Date(sampledata$DOJ, "%d/%m/%Y"),
function(x) min(abs(x - publicHolidays)))
sampledata
# sid DOJ isholiday nearest_holiday
#1 1 21/03/2019 1 0
#2 2 26/1/2019 1 0
#3 3 1/03/2019 0 3
#4 4 12/03/2019 0 8
#5 5 1/1/2019 1 0
如果您想在dplyr
链中使用它,我们可以翻译相同的逻辑
library(dplyr)
library(lubridate)
library(purrr)
sampledata %>%
mutate(nearest_holiday = map_dbl(dmy(DOJ), ~min(abs(. - publicHolidays))))