在R中按小时汇总更改日期和时间

时间:2017-03-20 20:05:49

标签: r datetime posix aggregate average

我经常使用聚合函数来查找每小时和每天POSIXlt数据的均值和总和。我试图在新数据集上使用相同的函数来获得每小时平均值,但是当我应用它时,它会更改时间戳。

数据是data.frame(称为" moT"),如下所示:

                  TS      T
1  2016-06-26 10:10:34 19.662 
2  2016-06-26 10:40:34 21.091
3  2016-06-26 11:10:34 23.388
4  2016-06-26 11:40:34 24.448
5  2016-06-26 12:10:34 25.513
6  2016-06-26 12:40:34 26.390
7  2016-06-26 01:10:34 27.468
8  2016-06-26 01:40:34 27.567
9  2016-06-26 02:10:34 26.977
10 2016-06-26 02:40:34 25.222
11 2016-06-26 03:10:34 23.100
12 2016-06-26 03:40:34 24.158
13 2016-06-26 04:10:34 21.951
14 2016-06-26 04:40:34 21.473
15 2016-06-26 05:10:34 19.948
16 2016-06-26 05:40:34 19.472
17 2016-06-26 06:10:34 18.806
18 2016-06-26 06:40:34 16.808
19 2016-06-26 07:10:34 15.282
20 2016-06-26 07:40:34 14.517

或根据建议的格式:

structure(list(TS = structure(list(sec = c(34, 34, 34, 34, 34, 
  34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34), 
min = c(10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L, 
10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L), hour = c(10L, 
10L, 11L, 11L, 12L, 12L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 
5L, 5L, 6L, 6L, 7L, 7L), mday = c(26L, 26L, 26L, 26L, 26L, 
26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 
26L, 26L, 26L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), year = c(116L, 
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L), wday = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L), yday = c(177L, 177L, 177L, 177L, 177L, 177L, 
177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L, 
177L, 177L, 177L, 177L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
zone = c("GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", 
"GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", 
"GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5"
), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon", 
"year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", 
"POSIXt"), tzone = "Etc/GMT+5"), T = c(19.662, 21.091, 23.388, 
24.448, 25.513, 26.39, 27.468, 27.567, 26.977, 25.222, 23.1, 
24.158, 21.951, 21.473, 19.948, 19.472, 18.806, 16.808, 15.282, 
14.517)), .Names = c("TS", "T"), row.names = c(NA, 20L), class = "data.frame")

我将此代码应用于" moT":

dat <- aggregate(moT["T"], format(moT["TS"], "%Y-%m-%d %H"), mean)

我期待这个输出(前五行):

              TS   meanT
1 "2016-06-26 10" 20.3765       
2 "2016-06-26 11" 23.918
3 "2016-06-26 12" 25.9515
4 "2016-06-26 13" 27.5175
5 "2016-06-26 14" 26.0995
  • 这是我在其他数据集上使用相同功能时发生的事情。

但是它是这样的:

               TS   meanT
1 "2016-01-07 00" 14.5650
2 "2016-01-07 01" 14.0380
3 "2016-01-07 02" 13.6540
4 "2016-01-07 03" 13.6540
5 "2016-01-07 04" 13.7500

为什么日期和时间会改变???

我尝试使用POSIXct而不是POSIXlt,尝试重新格式化我的csv文件中的datetime对象,尝试从POSIXlt对象中删除时区。

我看过这篇文章 How to calculate average of a variable by hour in R 这会给我我想要的结果,但需要将日期和时间分成两列。我很乐意这样做,但我想知道为什么会这样,所以我将来可以避免它,并知道使用哪种方法来处理哪些数据。

非常感谢。

0 个答案:

没有答案