计算时间序列的中位数,每8小时计算8

时间:2016-04-21 18:58:03

标签: r time time-series mean

我是R的新手,我必须计算包含5年的时间序列的平均值,以及每小时的臭氧等数据。

我的df看起来像:

structure(list(date = structure(c(1L, 1L, 1L, 1L), .Label = "01.01.2010", class = "factor"), 
day.of = c(1L, 1L, 1L, 1L), time = structure(1:4, .Label = c("00:00", 
"01:00", "02:00", "03:00"), class = "factor"), SVF_Ray = c(1L, 
1L, 1L, 1L), Gmax = c(0, 0, 0, 0), Ta = c(-1.3, -1.2, -1.2, 
-1.2), Tmrt = c(-19.3, -12.1, -12, -12.1), PET = c(-10.4, 
-8.7, -8.7, -8.7), PT = c(-11.3, -9.3, -9.3, -9.3), Ozon = c(61.35, 
62.65, 63.4, 63.85), rDatum = structure(c(14610, 14610, 14610, 
14610), class = "Date"), year = c(2010, 2010, 2010, 2010), 
month = c(1, 1, 1, 1), day = c(1, 1, 1, 1), hour = c(0, 1, 
2, 3)), .Names = c("date", "day.of", "time", "SVF_Ray", "Gmax", 
"Ta", "Tmrt", "PET", "PT", "Ozon", "rDatum", "year", "month", 
"day", "hour"), row.names = c(NA, 4L), class = "data.frame")

我想每8小时计算一次臭氧的平均值,所以每天计算一系列4个计算方法。我已经安排了我的基准:

Datum_Ozon$rDatum <- as.Date(data$date, format="%d.%m.%Y")

Datum_Ozon$hour<-as.numeric(unlist(strsplit(as.character(df$time), ":"))[seq(1, 2 * length(df$time), 2)])

格式为数字

但我不知道如何进一步实现我的目标。提前谢谢!

3 个答案:

答案 0 :(得分:0)

以下是使用dplyr管道而不是plyr方法以及ifelse()的基本示例。这里的一切都是自包含的:

library(dplyr)

## OP data
df <- 
structure(list(date = structure(c(1L, 1L, 1L, 1L), .Label = "01.01.2010", class = "factor"), 
day.of = c(1L, 1L, 1L, 1L), time = structure(1:4, .Label = c("00:00", 
"01:00", "02:00", "03:00"), class = "factor"), SVF_Ray = c(1L, 
1L, 1L, 1L), Gmax = c(0, 0, 0, 0), Ta = c(-1.3, -1.2, -1.2, 
-1.2), Tmrt = c(-19.3, -12.1, -12, -12.1), PET = c(-10.4, 
-8.7, -8.7, -8.7), PT = c(-11.3, -9.3, -9.3, -9.3), Ozon = c(61.35, 
62.65, 63.4, 63.85), rDatum = structure(c(14610, 14610, 14610, 
14610), class = "Date"), year = c(2010, 2010, 2010, 2010), 
month = c(1, 1, 1, 1), day = c(1, 1, 1, 1), hour = c(0, 1, 
2, 3)), .Names = c("date", "day.of", "time", "SVF_Ray", "Gmax", 
"Ta", "Tmrt", "PET", "PT", "Ozon", "rDatum", "year", "month", 
"day", "hour"), row.names = c(NA, 4L), class = "data.frame")

df %>%
  mutate(DayChunk=ifelse(hour %in% c(0:7),"FirstThird",
         ifelse(hour %in% c(8:15), "SecondThird"
              ,"ThirdThird")
         )) %>%
  group_by(Date, DayChunk) %>%
  summarise(MedOzon=median(Ozon))

答案 1 :(得分:0)

如果您的数据是常规且完整的(即每小时都有一条记录),则以下基本R代码应该可以解决问题:

ListWidget

请注意,此解决方案依赖于数据具有常规结构的假设,即每小时都有一条记录。如果缺少感兴趣的度量,即NA,那么简单地将na.rm添加到聚合函数将返回感兴趣的统计数据:

clicked()

如果您有一天中的小时变量,这是检查数据规律性的简单方法:

# Get the number of 8 hour intervals
intervalCnt <- nrow(df) / 8L

# add a grouping vector to your data
df$group <- rep(1:intervalCnt, each=8)

# get the median for each interval, keep year var around for later
intervalMedian <- aggregate(var~group + day + month + year, data=df, FUN=median)

此功能的结果是每小时的频率计数。计数应该相等。要检查的另一件事是第一次观察在最后一次观察后的一小时内开始,即如果观察时间1 ==&#34; 00:00&#34;那么最后观察的时间应该是23:00

要提供每年8小时的平均值图,您可以再次使用聚合:

# get the median for each interval
intervalMedian <- aggregate(var~group + day + month + year, data=df, FUN=median, na.rm=T)

在intervalMedian data.frame中包含组,日,月和年变量允许许多不同的聚合。例如,通过微调,可以获得每个时段 - 日 - 月的5年期间变量的平均值:

table(df$hourOfDay)

答案 2 :(得分:0)

查找函数seq.POSIXt。可以选择指定开始和停止间隔。此功能旨在创建时间序列。对于你的问题:

myseq<-seq(ISOdate(2010,01,01, 00, 00, 00, tz="GMT"), to=ISOdate(2016,01,05), by = "8 hour")

使用ISOdate功能设置开始和停止时间。如果你打算多次工作,我建议研究函数strptime和POSIXlt / ct时间类。 现在定义了中断并假设您的数据框(Datum_Ozon)中有一个名为&#34; datetime&#34;的列,然后使用&#34; cut&#34;对数据进行分组/子集。

Datum_Ozon$datetime<-as.POSIXct(paste(as.character(Datum_Ozon$date),
     as.character(Datum_Ozon$time)), "%d.%m.%Y %H:%M", tz="GMT" )

library(dplyr)
summarize(group_by(Datum_Ozon, cut(Datum_Ozon$datetime, myseq)), mean(Ozon))