按ID和日期平均复制

时间:2016-08-26 04:15:33

标签: r

我有以长格式存储的数据:

case class

我想通过id识别重复项并按日期时间平均这些ID下的值来平均我的重复项。在这里,我有两个重复。我想要的结果如下:

    datetime <- c("2016-04-03 00:00:00", "2016-04-03 00:15:00", "2016-04-03 00:30:00",
          "2016-04-03 00:00:00", "2016-04-03 00:15:00", "2016-04-03 00:30:00")
    temp <- c(46.7, 47.8, 46.2, 72.0, 71.1, 70.6)
    humid <- c(63.8, 60.3, 64.4, 24.1, 24.2, 24.3)
    id <- c("RSOSW_16", "RSOSW_16", "RSOSW_16", "RSOSE_16", "RSOSE_16", "RSOSE_16")
    df <- data.frame(datetime,temp,humid,id)

我还想在数据框中保留其他几列信息。提前谢谢。

3 个答案:

答案 0 :(得分:3)

如果您想忽略ID(这将与您给出的预期结果相符):

df       <- as.data.frame(cbind(datetime,temp,humid,id))
df$temp  <- as.numeric(as.character(df$temp))
df$humid <- as.numeric(as.character(df$humid))

aggregate(df[,colnames(df) %in% c("temp", "humid")],by=list(df$datetime),mean)
              Group.1  temp humid
1 2016-04-03 00:00:00 59.35 43.95
2 2016-04-03 00:15:00 59.45 42.25
3 2016-04-03 00:30:00 58.40 44.35

如果你想区分ID(这将匹配你给出的预期结果,因为该结果没有将ROSW_16ROSE_16分开:

aggregate(df[,colnames(df) %in% c("temp", "humid")],by=list(df$id, df$datetime),mean)
   Group.1             Group.2 temp humid
1 RSOSE_16 2016-04-03 00:00:00 72.0  24.1
2 RSOSW_16 2016-04-03 00:00:00 46.7  63.8
3 RSOSE_16 2016-04-03 00:15:00 71.1  24.2
4 RSOSW_16 2016-04-03 00:15:00 47.8  60.3
5 RSOSE_16 2016-04-03 00:30:00 70.6  24.3
6 RSOSW_16 2016-04-03 00:30:00 46.2  64.4

根据@thelatemail的评论,公式界面可能更整洁:

aggregate(cbind(temp,humid) ~ datetime, data=df, mean)

或只是

aggregate(df[,c("temp", "humid")], df["datetime"], mean)

答案 1 :(得分:1)

使用data.table包,我们可以按日期时间和ID分组平均值:

dt=data.table(datetime, temp,humid,id)
dt[, .(temp = mean(temp), humid=mean(humid)), by=.(datetime,id)]
#              datetime       id temp humid
#1: 2016-04-03 00:00:00 RSOSW_16 46.7  63.8
#2: 2016-04-03 00:15:00 RSOSW_16 47.8  60.3
#3: 2016-04-03 00:30:00 RSOSW_16 46.2  64.4
#4: 2016-04-03 00:00:00 RSOSE_16 72.0  24.1
#5: 2016-04-03 00:15:00 RSOSE_16 71.1  24.2
#6: 2016-04-03 00:30:00 RSOSE_16 70.6  24.3

或者,只按日期时间分组,我们得到

dt[, .(temp = mean(temp), humid=mean(humid)), by=.(datetime)]
#              datetime  temp humid
#1: 2016-04-03 00:00:00 59.35 43.95
#2: 2016-04-03 00:15:00 59.45 42.25
#3: 2016-04-03 00:30:00 58.40 44.35

答案 2 :(得分:0)

我们可以在summarise_each

中使用dplyr
library(dplyr)
as.data.frame(df, stringsAsFactors=FALSE) %>%
             group_by(datetime) %>%
             summarise_each(funs(mean(as.numeric(.))), temp, humid) 
#             datetime  temp humid
#                <chr> <dbl> <dbl>
#1 2016-04-03 00:00:00 59.35 43.95
#2 2016-04-03 00:15:00 59.45 42.25
#3 2016-04-03 00:30:00 58.40 44.35