我是R的新手,我必须计算包含5年的时间序列的平均值,以及每小时的臭氧等数据。
我的df看起来像:
structure(list(date = structure(c(1L, 1L, 1L, 1L), .Label = "01.01.2010", class = "factor"),
day.of = c(1L, 1L, 1L, 1L), time = structure(1:4, .Label = c("00:00",
"01:00", "02:00", "03:00"), class = "factor"), SVF_Ray = c(1L,
1L, 1L, 1L), Gmax = c(0, 0, 0, 0), Ta = c(-1.3, -1.2, -1.2,
-1.2), Tmrt = c(-19.3, -12.1, -12, -12.1), PET = c(-10.4,
-8.7, -8.7, -8.7), PT = c(-11.3, -9.3, -9.3, -9.3), Ozon = c(61.35,
62.65, 63.4, 63.85), rDatum = structure(c(14610, 14610, 14610,
14610), class = "Date"), year = c(2010, 2010, 2010, 2010),
month = c(1, 1, 1, 1), day = c(1, 1, 1, 1), hour = c(0, 1,
2, 3)), .Names = c("date", "day.of", "time", "SVF_Ray", "Gmax",
"Ta", "Tmrt", "PET", "PT", "Ozon", "rDatum", "year", "month",
"day", "hour"), row.names = c(NA, 4L), class = "data.frame")
我想每8小时计算一次臭氧的平均值,所以每天计算一系列4个计算方法。我已经安排了我的基准:
Datum_Ozon$rDatum <- as.Date(data$date, format="%d.%m.%Y")
Datum_Ozon$hour<-as.numeric(unlist(strsplit(as.character(df$time), ":"))[seq(1, 2 * length(df$time), 2)])
格式为数字
但我不知道如何进一步实现我的目标。提前谢谢!
答案 0 :(得分:0)
以下是使用dplyr
管道而不是plyr
方法以及ifelse()
的基本示例。这里的一切都是自包含的:
library(dplyr)
## OP data
df <-
structure(list(date = structure(c(1L, 1L, 1L, 1L), .Label = "01.01.2010", class = "factor"),
day.of = c(1L, 1L, 1L, 1L), time = structure(1:4, .Label = c("00:00",
"01:00", "02:00", "03:00"), class = "factor"), SVF_Ray = c(1L,
1L, 1L, 1L), Gmax = c(0, 0, 0, 0), Ta = c(-1.3, -1.2, -1.2,
-1.2), Tmrt = c(-19.3, -12.1, -12, -12.1), PET = c(-10.4,
-8.7, -8.7, -8.7), PT = c(-11.3, -9.3, -9.3, -9.3), Ozon = c(61.35,
62.65, 63.4, 63.85), rDatum = structure(c(14610, 14610, 14610,
14610), class = "Date"), year = c(2010, 2010, 2010, 2010),
month = c(1, 1, 1, 1), day = c(1, 1, 1, 1), hour = c(0, 1,
2, 3)), .Names = c("date", "day.of", "time", "SVF_Ray", "Gmax",
"Ta", "Tmrt", "PET", "PT", "Ozon", "rDatum", "year", "month",
"day", "hour"), row.names = c(NA, 4L), class = "data.frame")
df %>%
mutate(DayChunk=ifelse(hour %in% c(0:7),"FirstThird",
ifelse(hour %in% c(8:15), "SecondThird"
,"ThirdThird")
)) %>%
group_by(Date, DayChunk) %>%
summarise(MedOzon=median(Ozon))
答案 1 :(得分:0)
如果您的数据是常规且完整的(即每小时都有一条记录),则以下基本R代码应该可以解决问题:
ListWidget
请注意,此解决方案依赖于数据具有常规结构的假设,即每小时都有一条记录。如果缺少感兴趣的度量,即NA,那么简单地将na.rm添加到聚合函数将返回感兴趣的统计数据:
clicked()
如果您有一天中的小时变量,这是检查数据规律性的简单方法:
# Get the number of 8 hour intervals
intervalCnt <- nrow(df) / 8L
# add a grouping vector to your data
df$group <- rep(1:intervalCnt, each=8)
# get the median for each interval, keep year var around for later
intervalMedian <- aggregate(var~group + day + month + year, data=df, FUN=median)
此功能的结果是每小时的频率计数。计数应该相等。要检查的另一件事是第一次观察在最后一次观察后的一小时内开始,即如果观察时间1 ==&#34; 00:00&#34;那么最后观察的时间应该是23:00
要提供每年8小时的平均值图,您可以再次使用聚合:
# get the median for each interval
intervalMedian <- aggregate(var~group + day + month + year, data=df, FUN=median, na.rm=T)
在intervalMedian data.frame中包含组,日,月和年变量允许许多不同的聚合。例如,通过微调,可以获得每个时段 - 日 - 月的5年期间变量的平均值:
table(df$hourOfDay)
答案 2 :(得分:0)
查找函数seq.POSIXt。可以选择指定开始和停止间隔。此功能旨在创建时间序列。对于你的问题:
myseq<-seq(ISOdate(2010,01,01, 00, 00, 00, tz="GMT"), to=ISOdate(2016,01,05), by = "8 hour")
使用ISOdate功能设置开始和停止时间。如果你打算多次工作,我建议研究函数strptime和POSIXlt / ct时间类。 现在定义了中断并假设您的数据框(Datum_Ozon)中有一个名为&#34; datetime&#34;的列,然后使用&#34; cut&#34;对数据进行分组/子集。
Datum_Ozon$datetime<-as.POSIXct(paste(as.character(Datum_Ozon$date),
as.character(Datum_Ozon$time)), "%d.%m.%Y %H:%M", tz="GMT" )
library(dplyr)
summarize(group_by(Datum_Ozon, cut(Datum_Ozon$datetime, myseq)), mean(Ozon))