合并R中的汇总数据(再次)

时间:2013-07-25 01:55:01

标签: r merge aggregate zoo

跟进我之前的question with the same title,我有一个长期的小时数据,我想以各种方式聚合数据。我希望基于一天中的小时,但也有聚合的组合,例如,每天 - 每小时(即星期日凌晨1点,星期日凌晨2点等)。另一个例子是:周末或工作日 - 每小时。

以下示例显示了我所做的两种聚合。我做得那么远。所以我最终得到了两个动物园对象。我接下来要做的是将聚合合并到原始数据中,以便我可以比较聚合的错误。这就是我现在陷入困境的地方。

请注意,我不使用the previous question中的解决方案,因为我希望聚合的灵活性。

这是显示我到目前为止尝试过的内容的片段。任何帮助将不胜感激。

library(zoo)
Lines <- "Index,light.kw
2013-06-14 13:00:00,3.436
2013-06-14 13:15:00,3.327
2013-06-14 13:30:00,3.319
2013-06-14 13:45:00,3.308
2013-06-14 14:00:00,3.458
2013-06-14 14:15:00,3.452
2013-06-14 14:30:00,3.445
2013-06-14 14:45:00,3.469
2013-06-14 15:00:00,3.468
2013-06-14 15:15:00,3.427
2013-06-14 15:30:00,3.168
2013-06-14 15:45:00,2.383
2013-06-15 13:00:00,0.555
2013-06-15 13:15:00,0.555
2013-06-15 13:30:00,0.555
2013-06-15 13:45:00,0.555
2013-06-15 14:00:00,0.555
2013-06-15 14:15:00,0.555
2013-06-15 14:30:00,0.555
2013-06-15 14:45:00,0.719
2013-06-15 15:00:00,0.976
2013-06-15 15:15:00,0.981
2013-06-15 15:30:00,1.116
2013-06-15 15:45:00,0.59"
con <- textConnection(Lines)
z <- read.zoo(con, header=TRUE, sep=",",
     format="%Y-%m-%d %H:%M:%S", FUN=as.POSIXct)
close(con)

index.hourly = format(index(z), "%H")
z.hourly = aggregate(z, index.hourly, mean)
z.hourly
merge(z,z.hourly)

index.dayhour = format(index(z), "%w %H")
z.dayhour = aggregate(z, index.dayhour, mean)
z.dayhour
merge(z,z.dayhour)

1 个答案:

答案 0 :(得分:0)

根据上面的DWin's建议,这是我找到的解决方案。请注意,DWin建议的与中间列合并在动物园中不起作用,因此该解决方案涉及将zoo对象转换回数据帧并将合并作为数据帧。这是:

library(zoo)
Lines <- "Index,light.kw
2013-06-14 13:00:00,3.436
2013-06-14 13:15:00,3.327
2013-06-14 13:30:00,3.319
2013-06-14 13:45:00,3.308
2013-06-14 14:00:00,3.458
2013-06-14 14:15:00,3.452
2013-06-14 14:30:00,3.445
2013-06-14 14:45:00,3.469
2013-06-14 15:00:00,3.468
2013-06-14 15:15:00,3.427
2013-06-14 15:30:00,3.168
2013-06-14 15:45:00,2.383
2013-06-15 13:00:00,0.555
2013-06-15 13:15:00,0.555
2013-06-15 13:30:00,0.555
2013-06-15 13:45:00,0.555
2013-06-15 14:00:00,0.555
2013-06-15 14:15:00,0.555
2013-06-15 14:30:00,0.555
2013-06-15 14:45:00,0.719
2013-06-15 15:00:00,0.976
2013-06-15 15:15:00,0.981
2013-06-15 15:30:00,1.116
2013-06-15 15:45:00,0.59"
con <- textConnection(Lines)
z <- read.zoo(con, header=TRUE, sep=",",
     format="%Y-%m-%d %H:%M:%S", FUN=as.POSIXct)
close(con)

# make the index for aggregation
index.hourly <- format(index(z), "%H")
# make the aggregate
z.hourly = aggregate(z, index.hourly, mean, na.rm=T)

# make a data frame from the original zoo,
# but the data frame must include the index.hourly
# so that later we can merge the data frame based
# on this index.
# First, make a zoo object of the index and then
# merge this with the original zoo.
z.index.hourly = zoo(index.hourly,index(z))
z.with.index = merge(z,z.index.hourly)
# make a dataframe of the last zoo
df1 = as.data.frame(z.with.index)
# add the index of the df1 (which is the timestamp) as a column
# as we will need the timestamp to rebuild the zoo object.
df1$Index = row.names(df1)

# make a dataframe of the aggregate zoo
df2 = as.data.frame(z.hourly)
df2$Index = row.names(df2)

# merge the two data frame
df3 = merge(df1,df2,by.x="z.index.hourly",by.y="Index",all.x=T)
df3 = df3[order(df3$Index),]
summary(df3)

# make a zoo object containing the original data and the aggregate
z.merged.agg = zoo(df3[,c(2,4)],as.POSIXct(df3$Index, tz="GMT"))
z.merged.agg