当使用`aggregate()`进行平均时,R timeSeries中没有考虑第一个数据点;如何正确使用该功能?

时间:2015-05-20 07:29:38

标签: r datetime time-series

我想建立NordPool市场每小时电价的每日平均值。我正在使用aggregate()包中的timeSeries方法来构建每小时数据的每日均值,我已将其转换为timeSeries对象。这是前72小时的dput()

    > dput(tstSeries)
    new("timeSeries"
    , .Data = structure(c(31.05, 30.47, 28.92, 27.88, 26.96, 27.84, 28.79, 
28.63, 28.44, 28.3, 30.65, 31.55, 32.16, 32.45, 32.63, 33.65, 
34.9, 36.22, 36.65, 36.37, 35.49, 34.41, 34.66, 32.55, 33.15, 
32.66, 31.83, 31.47, 32.56, 34.36, 36.28, 38.39, 39.09, 38.33, 
38.42, 38.25, 37.96, 37.89, 37.88, 38.78, 39.83, 39.91, 39.32, 
38.49, 37.46, 36.94, 36.37, 34.59, 33.11, 32.22, 31.46, 31.67, 
32.05, 33.67, 34.93, 35.82, 36.38, 36.52, 36.71, 36.6, 36.51, 
36.4, 36.42, 36.58, 36.94, 36.94, 36.81, 36.43, 35.91, 35.45, 
34.77, 32.09), .Dim = c(72L, 1L), .Dimnames = list(NULL, "TS.1"))
    , units = "TS.1"
    , positions = c(1356998400, 1357002000, 1357005600, 1357009200, 1357012800, 
1357016400, 1357020000, 1357023600, 1357027200, 1357030800, 1357034400, 
1357038000, 1357041600, 1357045200, 1357048800, 1357052400, 1357056000, 
1357059600, 1357063200, 1357066800, 1357070400, 1357074000, 1357077600, 
1357081200, 1357084800, 1357088400, 1357092000, 1357095600, 1357099200, 
1357102800, 1357106400, 1357110000, 1357113600, 1357117200, 1357120800, 
1357124400, 1357128000, 1357131600, 1357135200, 1357138800, 1357142400, 
1357146000, 1357149600, 1357153200, 1357156800, 1357160400, 1357164000, 
1357167600, 1357171200, 1357174800, 1357178400, 1357182000, 1357185600, 
1357189200, 1357192800, 1357196400, 1357200000, 1357203600, 1357207200, 
1357210800, 1357214400, 1357218000, 1357221600, 1357225200, 1357228800, 
1357232400, 1357236000, 1357239600, 1357243200, 1357246800, 1357250400, 
1357254000)
    , format = "%Y-%m-%d %H:%M:%S"
    , FinCenter = "GMT"
    , recordIDs = structure(list(), .Names = character(0), row.names = integer(0), class = "data.frame")
    , title = "Time Series Object"
    , documentation = "Wed May 20 11:02:09 2015"
)

要进行平均,我会执行以下操作:

## daily averaging
bydaily = timeSequence(from = start(tstSeries), to = end(tstSeries), by = "day")
tstSeries.daily = aggregate(tstSeries, by = bydaily, FUN = mean)  

我得到的输出是:

tstSeries.daily

>GMT
TS.1
2013-01-01 31.05000
2013-01-02 31.82167
2013-01-03 36.67375  

这里,第一个每日平均值是原始数据点!我在Excel中执行了相同的计算并确认在平均操作中没有考虑第一个数据点,而是将2013-01-02的平均值计算为2013-01-01 01:00到2013的平均值 - 01-02 00:00

我看过几个例子,展示了aggregate()的使用,但没有遇到任何提出这个问题的例子。有没有人看到这种情况发生并有解决方法?

1 个答案:

答案 0 :(得分:0)

这是一个返回所需输出的解决方案。它取决于apply.rolling包中的PerformanceAnalytics函数。

tstSeries.daily<-apply.rolling(tstSeries,width=24,by=24, FUN="mean") # get the mean of each of the 24 hours intervals.
tstSeries.daily<-tstSeries.daily[complete.cases(tstSeries.daily),] # remove rows with NAs.
rownames(tstSeries.daily)<-as.Date(rownames(tstSeries.daily)) # remove the time part of the index.
print(tstSeries.daily)
GMT 
              calcs
2013-01-01 31.73417
2013-01-02 36.67542
2013-01-03 35.09958