我有两个日期列,我需要创建一个新列,其中包含每行两个日期之间的非假日和非周末数。
使用timeDate包的isHoliday在每行的基础上给出了正确的输出,但是当我使用向量化列应用相同的方法时,我得到以下错误。我理解错误,但是如何提供矢量作为输入以获得我想要的输出?
> library(timeDate)
> d1
sendDate postingDate
1 2014-07-03 2014-07-03
2 2014-07-03 2014-07-03
3 2014-07-03 2014-07-03
4 2014-07-03 2014-07-03
5 2014-07-03 2014-07-07
6 2014-07-03 2014-07-07
> d1$numBankDays <- sum(!isHoliday(timeSequence(d1$sendDate, d1$postingDate, 'day')))
Error in seq.timeDate(from = from, to = to, by = by) :
'from' must be of length 1
像这样循环遍历每一行并计算所需的值,但我不想循环遍历数百万行。有适当的解决方案吗?
> for (i in 1:nrow(d1)) {d1$numBankDays[i] <- sum(!isHoliday(timeSequence(d1$sendDate[i], d1$postingDate[i], 'day')))}
> d1
sendDate postingDate numBankDays
1 2014-07-03 2014-07-03 1
2 2014-07-03 2014-07-03 1
3 2014-07-03 2014-07-03 1
4 2014-07-03 2014-07-03 1
5 2014-07-03 2014-07-07 3
6 2014-07-03 2014-07-07 3
答案 0 :(得分:2)
使用apply
:
d1$days <- apply(d1, 1, function(x){sum(!isHoliday(timeSequence(x[1], x[2], 'day')))})
编辑:似乎函数isHoliday
仅为当前年份生成假期,而且每次运行它时调用都很慢。 isHoliday
中的比较也很慢
让我们使用他们每次调用的函数制作我们自己的假期列表,这样我们只需要调用一次(确保年份覆盖整个数据范围):
allholidays <- as.character(as.Date(holidayNYSE(2014:2015), format = "%Y-%m-%d"))
现在让我们发挥更好的作用:
isworkdayfunction <- function(df){
x <- seq(from = as.Date(df[1]), to = as.Date(df[2]), by = "day")
sum(!(x[isWeekday(x)] %in% allholidays))
}
现在我们可以使用apply:
d1$numBankDays <- apply(d1, 1, isworkdayfunction)
最后,让我们来看看三个版本:
library(microbenchmark)
microbenchmark(original=for (i in 1:nrow(d1)) {d1$numBankDays[i] <- sum(!isHoliday(timeSequence(d1$sendDate[i], d1$postingDate[i], 'day')))},
apply1 = apply(d1, 1, function(x){sum(!isHoliday(timeSequence(x[1], x[2], 'day')))}),
newapply = apply(d1,1,isworkdayfunction)
)
Unit: milliseconds
expr min lq mean median uq max neval
original 261.73945 267.584458 272.775199 270.54949 276.327679 305.155272 100
apply1 265.33750 269.710072 278.228613 272.45411 277.532853 446.030608 100
newapply 3.21943 3.334436 3.432978 3.38762 3.426595 6.440394 100
所以它现在快了大约100倍