每日转换为每周数据并处理假期

时间:2017-10-04 17:22:47

标签: r data.table

我有一个包含每日数据的数据表。从这个数据表中我想提取每周三获得的每周数据点。如果星期三是假日,即数据表中没有,则应采用下一个可用数据点。 这是一个MWE:

library(data.table)
df <- data.table(date=as.Date(c("2012-06-25","2012-06-26","2012-06-27","2012-06-28","2012-06-29","2012-07-02","2012-07-03","2012-07-05","2012-07-06","2012-07-09","2012-07-10","2012-07-11","2012-07-12","2012-07-13","2012-07-16","2012-07-17","2012-07-18","2012-07-19","2012-07-20")))
df[,weekday:=strftime(date,'%u')]

带输出:

         date  weekday
 1: 2012-06-25       1
 2: 2012-06-26       2
 3: 2012-06-27       3
 4: 2012-06-28       4
 5: 2012-06-29       5
 6: 2012-07-02       1
 7: 2012-07-03       2
 8: 2012-07-05       4 #here the 4th of July was skipped
 9: 2012-07-06       5
10: 2012-07-09       1
11: 2012-07-10       2
12: 2012-07-11       3
13: 2012-07-12       4
14: 2012-07-13       5
15: 2012-07-16       1
16: 2012-07-17       2
17: 2012-07-18       3
18: 2012-07-19       4
19: 2012-07-20       5

我想要的结果,在这种情况下将是:

     date  weekday
2012-06-27       3
2012-07-05       4
2012-07-11       3
2012-07-18       3

是否有更有效的方法来获取此方法,而不是逐周进行循环并检查星期三数据点是否包含在数据中?我觉得必须有更好的方法,所以任何建议都会受到高度赞赏!

工作解决方案(遵循Imo的建议)

df[,weekday:=wday(date)] #faster way to get weekdays, careful: numbers increased by 1 vs strftime
df[,numweek:=floor(as.numeric(date-date[1])/7+1)] #get continuous week numbers extending over end of years
df[df[,.I[which.min(abs(weekday-4.25))],by=.(numweek)]$V1] #gets result

1 个答案:

答案 0 :(得分:1)

这是一个在data.table上使用连接的方法,该方法使用每周查找最接近3(不是2,使用.I)的位置(使用which.min(abs(as.integer(weekday)-3.25)))。

df[df[, .I[which.min(abs(as.integer(weekday)-3.25))], by=week(date)]$V1]
         date weekday
1: 2012-06-27       3
2: 2012-07-05       4
3: 2012-07-11       3
4: 2012-07-18       3

请注意,如果您的真实数据跨越多年,则需要使用by=.(week(date), year(date))

另请注意,data.table函数wday将直接返回星期几。它比strftime返回的字符整数值大1,因此如果您想直接使用它,则需要进行调整。

使用单个变量从data.table中,您可以

df[, weekday := wday(date)]
df[df[, .I[which.min(abs(weekday-4.25))], by=week(date)]$V1]
         date weekday
1: 2012-06-27       4
2: 2012-07-05       5
3: 2012-07-11       4
4: 2012-07-18       4

请注意,日期与上述日期相符。