现在我有一个带有dim(1:1080)的data.frame,其中包含变量date,time和glob.rad。
date time glob.rad
1 2014/07/19 00:00:00 -1.6
2 2014/07/19 00:02:00 -1.6
3 2014/07/19 00:03:00 -1.6
4 2014/07/19 00:04:00 -1.6
5 2014/07/19 00:06:00 -1.6
6 2014/07/19 00:07:00 -1.6
7 2014/07/19 00:08:00 -1.6
8 2014/07/19 00:10:00 -1.6
9 2014/07/19 00:11:00 -1.6
10 2014/07/19 00:12:00 -1.6
11 2014/07/19 00:14:00 -1.6
12 2014/07/19 00:15:00 -1.6
13 2014/07/19 00:16:00 -1.6
14 2014/07/19 00:18:00 -1.5
15 2014/07/19 00:19:00 -1.5
16 2014/07/19 00:20:00 -1.4
17 2014/07/19 00:22:00 -1.4
18 2014/07/19 00:23:00 -1.3
19 2014/07/19 00:24:00 -1.3
20 2014/07/19 00:26:00 -1.3
21 2014/07/19 00:27:00 -1.3
22 2014/07/19 00:28:00 -1.3
23 2014/07/19 00:30:00 -1.3
24 2014/07/19 00:31:00 -1.4
25 2014/07/19 00:32:00 -1.4
26 2014/07/19 00:34:00 -1.5
27 2014/07/19 00:35:00 -1.5
28 2014/07/19 00:36:00 -1.6
29 2014/07/19 00:38:00 -1.6
30 2014/07/19 00:39:00 -1.6
31 2014/07/19 00:40:00 -1.6
32 2014/07/19 00:42:00 -1.6
33 2014/07/19 00:43:00 -1.6
34 2014/07/19 00:44:00 -1.6
35 2014/07/19 00:46:00 -1.6
36 2014/07/19 00:47:00 -1.6
37 2014/07/19 00:48:00 -1.6
38 2014/07/19 00:50:00 -1.6
39 2014/07/19 00:51:00 -1.6
40 2014/07/19 00:52:00 -1.6
41 2014/07/19 00:54:00 -1.6
42 2014/07/19 00:55:00 -1.6
43 2014/07/19 00:56:00 -1.6
44 2014/07/19 00:58:00 -1.6
45 2014/07/19 00:59:00 -1.6
46 2014/07/19 01:00:00 -1.6
47 2014/07/19 01:02:00 -1.6
48 2014/07/19 01:03:00 -1.6
49 2014/07/19 01:04:00 -1.6
50 2014/07/19 01:06:00 -1.6
...
所有变量都是因素。目的是将变量" time"以小时为间隔,以计算" glob.rad"的平均值。一小时之内。
date time glob.rad
1 2014/07/19 00:00:00 -1.6
2 2014/07/19 01:00:00 -1.6
3 2014/07/19 02:00:00 -1.6
...
虽然我知道如何处理POSIXct数据作为日期时间,但不知道如何处理时间因素。
到目前为止,我已尝试过cut()
和subset()
以及as.numeric()
,但它无法正常使用。
答案 0 :(得分:2)
您不需要将时间作为一个因素来处理。您可以这样做,但将日期和时间列粘贴在一起以用于分组将减少压力。 data.table 包使得这非常简单,因为它具有提取POSIX / Date对象部分的功能。我们可以将这些部分用于我们的分组。
library(data.table)
setDT(df)[, .(mean = mean(glob.rad)), by = hour(paste(date, time))]
# hour mean
# 1: 0 -1.533333
# 2: 1 -1.600000
除了转换为数据表之外,原始数据保持不变。如果您想在结果中使用日期和小时,则可以执行
df[, .(mean = mean(glob.rad)), by = .(date, hour(paste(date, time)))]
# date hour mean
# 1: 2014/07/19 0 -1.533333
# 2: 2014/07/19 1 -1.600000
最后一个块确实在日期列中使用了一个因子,因为我没有必要将其更改为日期分类列。
答案 1 :(得分:1)
我喜欢带管道的dplyr的语义(%>%)。这就像读一句话一样。
tab <- structure(list(date = c("2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19"), time = c("00:00:00", "00:02:00",
"00:03:00", "00:04:00", "00:06:00", "00:07:00", "00:08:00", "00:10:00",
"00:11:00", "00:12:00", "00:14:00", "00:15:00", "00:16:00", "00:18:00",
"00:19:00", "00:20:00", "00:22:00", "00:23:00", "00:24:00", "00:26:00",
"00:27:00", "00:28:00", "00:30:00", "00:31:00", "00:32:00", "00:34:00",
"00:35:00", "00:36:00", "00:38:00", "00:39:00", "00:40:00", "00:42:00",
"00:43:00", "00:44:00", "00:46:00", "00:47:00", "00:48:00", "00:50:00",
"00:51:00", "00:52:00", "00:54:00", "00:55:00", "00:56:00", "00:58:00",
"00:59:00", "01:00:00", "01:02:00", "01:03:00", "01:04:00", "01:06:00"
), glob.rad = c(-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6,
-1.6, -1.6, -1.6, -1.6, -1.6, -1.5, -1.5, -1.4, -1.4, -1.3, -1.3,
-1.3, -1.3, -1.3, -1.3, -1.4, -1.4, -1.5, -1.5, -1.6, -1.6, -1.6,
-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6,
-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6)), .Names = c("date",
"time", "glob.rad"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46",
"47", "48", "49", "50"))
#> head(tab)
# date time glob.rad
#1 2014/07/19 00:00:00 -1.6
#2 2014/07/19 00:02:00 -1.6
#3 2014/07/19 00:03:00 -1.6
#4 2014/07/19 00:04:00 -1.6
#5 2014/07/19 00:06:00 -1.6
#6 2014/07/19 00:07:00 -1.6
library(lubridate)
library(dplyr)
tab$date <- ymd_hms(paste(tab$date, tab$time))
tab$hour <- hour(tab$date)
#head(tab)
tab%>%
group_by(hour)%>%
summarise(avg=mean(glob.rad, na.rm=T))
#Source: local data frame [2 x 2]
#
# hour avg
#1 0 -1.533333
#2 1 -1.600000
如果您想按天和小时汇总glob.rad,并且为了简单起见,您可以从日期列创建一个新的变量提取日。
tab$day <- day(tab$date)
并将其添加到您的分组
tab%>%
group_by(day, hour)%>%
summarise(avg=mean(glob.rad, na.rm=T))
Source: local data frame [2 x 3]
Groups: day
day hour avg
1 19 0 -1.533333
2 19 1 -1.600000
sessionInfo()
#R version 3.2.2 (2015-08-14)
#...
#other attached packages:
#[1] lubridate_1.3.3 dplyr_0.4.2