我有一个包含每日数据的数据表。从这个数据表中我想提取每周三获得的每周数据点。如果星期三是假日,即数据表中没有,则应采用下一个可用数据点。 这是一个MWE:
library(data.table)
df <- data.table(date=as.Date(c("2012-06-25","2012-06-26","2012-06-27","2012-06-28","2012-06-29","2012-07-02","2012-07-03","2012-07-05","2012-07-06","2012-07-09","2012-07-10","2012-07-11","2012-07-12","2012-07-13","2012-07-16","2012-07-17","2012-07-18","2012-07-19","2012-07-20")))
df[,weekday:=strftime(date,'%u')]
带输出:
date weekday
1: 2012-06-25 1
2: 2012-06-26 2
3: 2012-06-27 3
4: 2012-06-28 4
5: 2012-06-29 5
6: 2012-07-02 1
7: 2012-07-03 2
8: 2012-07-05 4 #here the 4th of July was skipped
9: 2012-07-06 5
10: 2012-07-09 1
11: 2012-07-10 2
12: 2012-07-11 3
13: 2012-07-12 4
14: 2012-07-13 5
15: 2012-07-16 1
16: 2012-07-17 2
17: 2012-07-18 3
18: 2012-07-19 4
19: 2012-07-20 5
我想要的结果,在这种情况下将是:
date weekday
2012-06-27 3
2012-07-05 4
2012-07-11 3
2012-07-18 3
是否有更有效的方法来获取此方法,而不是逐周进行循环并检查星期三数据点是否包含在数据中?我觉得必须有更好的方法,所以任何建议都会受到高度赞赏!
工作解决方案(遵循Imo的建议):
df[,weekday:=wday(date)] #faster way to get weekdays, careful: numbers increased by 1 vs strftime
df[,numweek:=floor(as.numeric(date-date[1])/7+1)] #get continuous week numbers extending over end of years
df[df[,.I[which.min(abs(weekday-4.25))],by=.(numweek)]$V1] #gets result
答案 0 :(得分:1)
这是一个在data.table上使用连接的方法,该方法使用每周查找最接近3(不是2,使用.I
)的位置(使用which.min(abs(as.integer(weekday)-3.25))
)。
df[df[, .I[which.min(abs(as.integer(weekday)-3.25))], by=week(date)]$V1]
date weekday
1: 2012-06-27 3
2: 2012-07-05 4
3: 2012-07-11 3
4: 2012-07-18 3
请注意,如果您的真实数据跨越多年,则需要使用by=.(week(date), year(date))
。
另请注意,data.table
函数wday
将直接返回星期几。它比strftime
返回的字符整数值大1,因此如果您想直接使用它,则需要进行调整。
使用单个变量从data.table中,您可以
df[, weekday := wday(date)]
df[df[, .I[which.min(abs(weekday-4.25))], by=week(date)]$V1]
date weekday
1: 2012-06-27 4
2: 2012-07-05 5
3: 2012-07-11 4
4: 2012-07-18 4
请注意,日期与上述日期相符。