我经常使用聚合函数来查找每小时和每天POSIXlt数据的均值和总和。我试图在新数据集上使用相同的函数来获得每小时平均值,但是当我应用它时,它会更改时间戳。
数据是data.frame(称为" moT"),如下所示:
TS T
1 2016-06-26 10:10:34 19.662
2 2016-06-26 10:40:34 21.091
3 2016-06-26 11:10:34 23.388
4 2016-06-26 11:40:34 24.448
5 2016-06-26 12:10:34 25.513
6 2016-06-26 12:40:34 26.390
7 2016-06-26 01:10:34 27.468
8 2016-06-26 01:40:34 27.567
9 2016-06-26 02:10:34 26.977
10 2016-06-26 02:40:34 25.222
11 2016-06-26 03:10:34 23.100
12 2016-06-26 03:40:34 24.158
13 2016-06-26 04:10:34 21.951
14 2016-06-26 04:40:34 21.473
15 2016-06-26 05:10:34 19.948
16 2016-06-26 05:40:34 19.472
17 2016-06-26 06:10:34 18.806
18 2016-06-26 06:40:34 16.808
19 2016-06-26 07:10:34 15.282
20 2016-06-26 07:40:34 14.517
或根据建议的格式:
structure(list(TS = structure(list(sec = c(34, 34, 34, 34, 34,
34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34),
min = c(10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L,
10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L), hour = c(10L,
10L, 11L, 11L, 12L, 12L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L,
5L, 5L, 6L, 6L, 7L, 7L), mday = c(26L, 26L, 26L, 26L, 26L,
26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L,
26L, 26L, 26L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), year = c(116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L), wday = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), yday = c(177L, 177L, 177L, 177L, 177L, 177L,
177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L,
177L, 177L, 177L, 177L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
zone = c("GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5",
"GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5",
"GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5"
), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon",
"year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt",
"POSIXt"), tzone = "Etc/GMT+5"), T = c(19.662, 21.091, 23.388,
24.448, 25.513, 26.39, 27.468, 27.567, 26.977, 25.222, 23.1,
24.158, 21.951, 21.473, 19.948, 19.472, 18.806, 16.808, 15.282,
14.517)), .Names = c("TS", "T"), row.names = c(NA, 20L), class = "data.frame")
我将此代码应用于" moT":
dat <- aggregate(moT["T"], format(moT["TS"], "%Y-%m-%d %H"), mean)
我期待这个输出(前五行):
TS meanT
1 "2016-06-26 10" 20.3765
2 "2016-06-26 11" 23.918
3 "2016-06-26 12" 25.9515
4 "2016-06-26 13" 27.5175
5 "2016-06-26 14" 26.0995
但是它是这样的:
TS meanT
1 "2016-01-07 00" 14.5650
2 "2016-01-07 01" 14.0380
3 "2016-01-07 02" 13.6540
4 "2016-01-07 03" 13.6540
5 "2016-01-07 04" 13.7500
为什么日期和时间会改变???
我尝试使用POSIXct而不是POSIXlt,尝试重新格式化我的csv文件中的datetime对象,尝试从POSIXlt对象中删除时区。
我看过这篇文章 How to calculate average of a variable by hour in R 这会给我我想要的结果,但需要将日期和时间分成两列。我很乐意这样做,但我想知道为什么会这样,所以我将来可以避免它,并知道使用哪种方法来处理哪些数据。
非常感谢。