所以我有一个看起来像这样但没有周末的数据集:
X1 X2
3798 2009-12-29 0
3799 2009-12-30 0
3800 2009-12-31 0
3802 2010-01-02 0
3803 2010-01-03 2.1
3804 2010-01-04 0
3805 2010-01-05 0
3806 2010-01-06 0
3807 2010-01-07 0
3808 2010-01-08 0
3809 2010-01-09 0
3810 2010-01-10 6.8
3811 2010-01-12 0
3812 2010-01-13 0
3813 2010-01-14 17.7
3814 2010-01-16 0
3815 2010-01-17 0
3816 2010-01-18 1.5
3817 2010-01-19 0
3818 2010-01-20 0
3819 2010-01-21 0
3820 2010-01-22 0
3821 2010-01-23 0
3822 2010-01-24 0
3823 2010-01-25 0
3824 2010-01-26 0
3825 2010-01-27 4.5
3826 2010-01-28 0
3827 2010-01-29 0
3828 2010-01-31 0
3829 2010-02-01 0
3830 2010-02-03 0
3831 2010-02-04 0
3832 2010-02-05 0
3833 2010-02-07 0
3834 2010-02-08 0
3835 2010-02-09 1.2
我希望在每个月的第15天左右获得5天的平均值,如果第15次发生在周末并且数据集中不存在,我想要在5天左右平均最近的日期(第14或第16),这可能吗?
所以这是预期的输出
X1 X2 5-day average
1 2009-12-14 2
2 2010-01-15 3
3 2010-02-15 4
4 2010-03-16 2
5 2010-04-15 1
6 2010-05-14 7
答案 0 :(得分:1)
使用rollapply
中的zoo
函数获取滚动平均值非常容易。然后你可以提取你需要的那些(即每个月的15号左右)。
# packages used
require(data.table)
require(zoo)
# data preparation
df <- read.table(text=' X1 X2
3798 2009-12-29 0
3799 2009-12-30 0
3800 2009-12-31 0
3802 2010-01-02 0
3803 2010-01-03 2.1
3804 2010-01-04 0
3805 2010-01-05 0
3806 2010-01-06 0
3807 2010-01-07 0
3808 2010-01-08 0
3809 2010-01-09 0
3810 2010-01-10 6.8
3811 2010-01-12 0
3812 2010-01-13 0
3813 2010-01-14 17.7
3814 2010-01-16 0
3815 2010-01-17 0
3816 2010-01-18 1.5
3817 2010-01-19 0
3818 2010-01-20 0
3819 2010-01-21 0
3820 2010-01-22 0
3821 2010-01-23 0
3822 2010-01-24 0
3823 2010-01-25 0
3824 2010-01-26 0
3825 2010-01-27 4.5
3826 2010-01-28 0
3827 2010-01-29 0
3828 2010-01-31 0
3829 2010-02-01 0
3830 2010-02-03 0
3831 2010-02-04 0
3832 2010-02-05 0
3833 2010-02-07 0
3834 2010-02-08 0
3835 2010-02-09 1.2', header=TRUE)
setDT(df)
df[, X1 <- as.Date(X1)]
setkey(df, X1)
# taking rolling averages
df[, rmean:=rollapply(X2, 5, mean, fill=NA)]
# extracting the rolling averages you need
dt <- df[, list(day15=abs(mday(X1)-15) == min(abs(mday(X1)-15)), X1, rmean), by=list(year(X1), month(X1))]
dt[day15==TRUE]
dt[day15==TRUE, .SD[1,] ,by=list(month, year)]