R:对于数据帧中的两列数据,每小时平均15分钟数据

时间:2016-08-17 21:48:13

标签: r

我的数据看起来像这样(为了简单起见,我删除了其他几列。)

 Index   Date        Time Humid Temp    id
    93 4/3/16 12:00:00 AM  63.8 46.7 RSOSW
    94 4/3/16 12:15:00 AM  60.3 47.8 RSOSW
    95 4/3/16 12:30:00 AM  64.4 46.2 RSOSW
    96 4/3/16 12:45:00 AM  60.4 46.8 RSOSW
    97 4/3/16  1:00:00 AM  61.3 46.6 RSOSW
    98 4/3/16  1:15:00 AM  68.5 44.3 RSOSW
    99 4/3/16  1:30:00 AM  70.5 43.4 RSOSW
   100 4/3/16  1:45:00 AM  75.1 41.8 RSOSW
   101 4/3/16  2:00:00 AM  74.9 41.3 RSOSW
   102 4/3/16  2:15:00 AM  73.6 41.1 RSOSW
   103 4/3/16  2:30:00 AM  72.8 41.2 RSOSW
   104 4/3/16  2:45:00 AM  71.1 41.2 RSOSW
    93 4/3/16 12:00:00 AM  64.9 47.8 RSOSE
    94 4/3/16 12:15:00 AM  61.2 48.9 RSOSE
    95 4/3/16 12:30:00 AM  63.3 45.3 RSOSE
    96 4/3/16 12:45:00 AM  62.6 42.3 RSOSE
    97 4/3/16  1:00:00 AM  60.9 49.9 RSOSE
    98 4/3/16  1:15:00 AM  67.3 45.3 RSOSE
    99 4/3/16  1:30:00 AM  72.1 42.1 RSOSE
   100 4/3/16  1:45:00 AM  79.0 40.5 RSOSE
   101 4/3/16  2:00:00 AM  73.4 42.3 RSOSE
   102 4/3/16  2:15:00 AM  73.6 40.1 RSOSE
   103 4/3/16  2:30:00 AM  71.9 46.5 RSOSE
   104 4/3/16  2:45:00 AM  70.6 45.4 RSOSE

我想通过id获得每小时平均温度和湿度。我正在寻找的结果是:(我想在每条记录中保留其他简单的数据删除列。)

  Date  Hour  Humid   Temp    id
4/3/16    00 62.225 46.875 RSOSW
4/3/16    01  68.85 44.025 RSOSW
4/3/16    02   73.1   41.2 RSOSW
4/3/16    00     63 46.075 RSOSE
4/3/16    01 69.825  44.45 RSOSE
4/3/16    02 72.375 43.575 RSOSE

更新

  Index   Date        Time Humid Temp  serialnum       id         farm location
     93 4/3/16 12:00:00 AM  63.8 46.7 1310014696 RSOSW_16 River School  Outside
     94 4/3/16 12:15:00 AM  60.3 47.8 1310014696 RSOSW_16 River School  Outside
     95 4/3/16 12:30:00 AM  64.4 46.2 1310014696 RSOSW_16 River School  Outside
     96 4/3/16 12:45:00 AM  60.4 46.8 1310014696 RSOSW_16 River School  Outside
     97 4/3/16  1:00:00 AM  61.3 46.6 1310014696 RSOSW_16 River School  Outside
     98 4/3/16  1:15:00 AM  68.5 44.3 1310014696 RSOSW_16 River School  Outside

serialnum,id,farm和location都是字符。

提前致谢。

1 个答案:

答案 0 :(得分:3)

library(lubridate)
df[,2] <- mdy_hms(df[,2])

df %>% mutate(hour = hour(df[,2])) %>% 
  group_by(id, hour) %>% summarise_at(vars(Humid, Temp), mean)

结果如下

Source: local data frame [6 x 4]
Groups: id [?]

      id  hour  Humid   Temp
  <fctr> <int>  <dbl>  <dbl>
1  RSOSE     0 63.000 46.075
2  RSOSE     1 69.825 44.450
3  RSOSE     2 72.375 43.575
4  RSOSW     0 62.225 46.875
5  RSOSW     1 68.850 44.025
6  RSOSW     2 73.100 41.200

如果您希望保持列不变,并使用您计算的方法替换值,则可以

df %>% mutate(hour = hour(df[,2])) %>% 
  group_by(id, hour) %>% mutate_at(vars(Humid, Temp), mean) %>% head

它将导致

Source: local data frame [6 x 6]
Groups: id, hour [2]

Index            datetime  Humid   Temp     id  hour
<int>              <time>  <dbl>  <dbl> <fctr> <int>
  1    93 2016-04-03 00:00:00 62.225 46.875  RSOSW     0
2    94 2016-04-03 00:15:00 62.225 46.875  RSOSW     0
3    95 2016-04-03 00:30:00 62.225 46.875  RSOSW     0
4    96 2016-04-03 00:45:00 62.225 46.875  RSOSW     0
5    97 2016-04-03 01:00:00 68.850 44.025  RSOSW     1
6    98 2016-04-03 01:15:00 68.850 44.025  RSOSW     1

清理您的数据(请在下次发布输出)

df <- read.table(text =
                 "93 4/3/16 12:00:00 AM  63.8 46.7 RSOSW
                 94 4/3/16 12:15:00 AM  60.3 47.8 RSOSW
                 95 4/3/16 12:30:00 AM  64.4 46.2 RSOSW
                 96 4/3/16 12:45:00 AM  60.4 46.8 RSOSW
                 97 4/3/16  1:00:00 AM  61.3 46.6 RSOSW
                 98 4/3/16  1:15:00 AM  68.5 44.3 RSOSW
                 99 4/3/16  1:30:00 AM  70.5 43.4 RSOSW
                 100 4/3/16  1:45:00 AM  75.1 41.8 RSOSW
                 101 4/3/16  2:00:00 AM  74.9 41.3 RSOSW
                 102 4/3/16  2:15:00 AM  73.6 41.1 RSOSW
                 103 4/3/16  2:30:00 AM  72.8 41.2 RSOSW
                 104 4/3/16  2:45:00 AM  71.1 41.2 RSOSW
                 93 4/3/16 12:00:00 AM  64.9 47.8 RSOSE
                 94 4/3/16 12:15:00 AM  61.2 48.9 RSOSE
                 95 4/3/16 12:30:00 AM  63.3 45.3 RSOSE
                 96 4/3/16 12:45:00 AM  62.6 42.3 RSOSE
                 97 4/3/16  1:00:00 AM  60.9 49.9 RSOSE
                 98 4/3/16  1:15:00 AM  67.3 45.3 RSOSE
                 99 4/3/16  1:30:00 AM  72.1 42.1 RSOSE
                 100 4/3/16  1:45:00 AM  79.0 40.5 RSOSE
                 101 4/3/16  2:00:00 AM  73.4 42.3 RSOSE
                 102 4/3/16  2:15:00 AM  73.6 40.1 RSOSE
                 103 4/3/16  2:30:00 AM  71.9 46.5 RSOSE
                 104 4/3/16  2:45:00 AM  70.6 45.4 RSOSE")

df[,2] <- paste(df[,2], df[,3], df[,4])
df <- df[,c(-3,-4)]

names(df) <- c("Index", "datetime", "Humid", "Temp", "id")