基于使用R的Pentad日期的子集csv数据

时间:2016-12-15 08:21:47

标签: r csv subset

我想根据Pentad日期(非重叠的日期平均值)对以下csv文件进行子集化。例如:

1.January 1 to January 5
2.January 6 to January 10
...
73.December 27 to December 31. 

以下是五项日期的完整列表:

List of Pentad dates

The Complete Data

样本数据

SN,CY,Y,M,D,H,lat,lon,cat
198305,5,1983,8,5,0,9.1,140.7,"TD"
198305,5,1983,8,5,6,9.3,140.5,"TD"
198305,5,1983,8,5,12,9.6,139.9,"TD"
198305,5,1983,8,5,18,9.9,139.4,"TS"
198305,5,1983,8,6,0,10.2,138.8,"TS"
198305,5,1983,8,6,6,11,138.1,"TS"
198305,5,1983,8,6,12,11.8,137.3,"TS"
198305,5,1983,8,6,18,12.4,136.4,"Cat1"
198305,5,1983,8,7,0,12.8,135.8,"Cat1"
198305,5,1983,8,7,6,13.6,134.7,"Cat1"
198305,5,1983,8,7,12,14.4,133.9,"Cat2"
198305,5,1983,8,7,18,15,133.5,"Cat4"
198305,5,1983,8,8,0,15.8,132.8,"Cat4"
198305,5,1983,8,8,6,16.3,132.4,"Cat4"
198305,5,1983,8,8,12,17.1,132,"Cat5"
198305,5,1983,8,8,18,17.4,131.4,"Cat5"
198305,5,1983,8,9,0,17.8,130.8,"Cat5"
198305,5,1983,8,9,6,18.1,130.7,"Cat4"
198305,5,1983,8,9,12,18.7,130.3,"Cat4"
198305,5,1983,8,9,18,18.9,130.4,"Cat4

SN是唯一标识符,Y是年,M是月,D是天,H是小时。如果唯一编号落在一个五元组中,则它不应再包含在下一个子集中。

我已经尝试了八月(基于上一篇文章):

P1  <- c(1,6,11,16,21,26)
P6  <- c(5,10,15,20,25,30)
res <- Map(function(x,y) subset(df1, M==8 & D >=x & D <= y), d1, d2)

但我在使用起始五元组(P7)进行映射时遇到问题,因为它包括1月31日到2月4日。

任何人都可以在R中建议任何方法吗?我很感激任何帮助。

1 个答案:

答案 0 :(得分:0)

library(stringr)
df$Date = paste(df$Y, str_pad(df$M,2,'left','0'), str_pad(df$D,2,'left','0'), sep='-')
# Extract day of year (int 0 to 365) from POSIXlt date
df$yday = as.POSIXlt(df$Date)$yday + 1

现在它是微不足道的:

df$pentad = ceiling(df$yday/5)