我的数据框有两个不同的ID,并且event_time
相同。我应该将此数据帧聚合为1小时,并取剩余列的平均值
id event_time 1 2 3 4 33 34 38 39 41 42
1 1001 2017-05-22 16:56:07 NA NA NA NA NA NA NA 1215.35 NA NA
2 1001 2017-05-22 16:57:07 NA NA NA NA NA NA 53.5 1243.36 0.24 0.20
3 1001 2017-05-22 16:58:07 NA NA NA NA NA NA 53.8 1234.08 0.71 0.88
4 1001 2017-05-22 16:59:07 NA NA NA NA NA NA 53.2 1236.73 0.55 0.42
5 1001 2017-05-22 17:00:08 NA NA NA NA NA NA 53.8 1257.87 0.43 0.36
6 1001 2017-05-22 17:01:08 NA NA NA NA NA NA 52.8 1222.55 0.78 0.42
....
id event_time 1 2 3 4 33 34 38 39 41 42
95 1002 2017-05-22 16:56:50 NA NA NA NA NA NA NA 1220.35 NA NA
96 1002 2017-05-22 16:57:07 NA NA NA NA NA NA 53.5 1233.36 0.24 0.20
97 1002 2017-05-22 16:58:17 NA NA NA NA 44 NA 53.8 1256.08 0.71 0.88
98 1002 2017-05-22 16:59:33 NA 11 NA NA NA NA 53.2 1277.73 0.55 0.42
99 1002 2017-05-22 17:00:21 NA 11 NA NA 56 NA 53.8 1288.87 0.43 0.36
100 1002 2017-05-22 17:01:10 NA 19 NA NA NA NA 52.8 1201.55 0.78 0.42
我使用dplyr包将group_by
用于ID'然后聚合。但它会抛出错误
data_1hour <- data %>% group_by(id) %>% aggregate(list( Tag_1 = data$`1`, Tag_2 = data$`2`,
Tag_3 = data$`3`, Tag_4 = data$`4`,
Tag_33 = data$`33`,Tag_34 = data$`34`,
Tag_38 = data$`38`,
Tag_39 = data$`39`,Tag_40 = data$`41`,
Tag_42 = data$`42`),
list(timestamps = cut(data$event_time, "1 hour")),mean, na.rm = "TRUE")
match.fun(FUN)出错:&#39; list(timestamps = cut(data $ event_time, &#34; 1小时&#34;))&#39;不是函数,字符或符号
我有太多的NA值并且想忽略它,因此我使用了na.omit = true
。我该如何处理?
答案 0 :(得分:1)
您可以按小时聚合,首先提取日期和小时,然后根据此新变量进行聚合。它可能看起来像这样:
library(dplyr)
## Some sample data:
data <- data.frame(
id = c(1001L, 1001L, 1001L, 1001L, 1002L, 1002L),
event_time = c("2017-05-22 16:56:07", "2017-05-22 16:57:07",
"2017-05-22 16:58:07", "2017-05-22 16:59:07", "2017-05-22 17:00:08",
"2017-05-22 17:01:08"),
`1` = c(NA, NA, NA, NA, NA, NA),
`2` = c(NA, NA, NA, NA, NA, NA),
`3` = c(NA, NA, NA, NA, NA, NA),
`4` = c(NA, NA, NA, NA, NA, NA),
`33` = c(NA, NA, NA, NA, NA, NA),
`34` = c(NA, NA, NA, NA, NA, NA),
`38` = c(NA, 53.5, 53.8, 53.2, 53.8, 52.8),
`39` = c(1215.35, 1243.36, 1234.08, 1236.73, 1257.87, 1222.55),
`41` = c(NA, 0.24, 0.71, 0.55, 0.43, 0.78),
`42` = c(NA, 0.2, 0.88, 0.42, 0.36, 0.42)) %>%
setNames(c("id", "event_time", "1", "2", "3", "4", "33", "34", "38", "39",
"41", "42"))
## Aggregate by hour and compute mean values:
hourlyMeans <- data %>% dplyr::mutate(dayHour = substr(event_time, 1, 13)) %>%
dplyr::group_by(id, dayHour) %>%
dplyr::summarise(Tag_3 = mean(`3`, na.rm = TRUE),
Tag_33 = mean(`33`, na.rm = TRUE),
Tag_38 = mean(`38`, na.rm = TRUE),
Tag_39 = mean(`39`, na.rm = TRUE),
Tag_42 = mean(`42`, na.rm = TRUE))
结果如下:
# # A tibble: 2 x 7
# # Groups: id [?]
# id dayHour Tag_3 Tag_33 Tag_38 Tag_39 Tag_42
# <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1001 2017-05-22 16 NaN NaN 53.5 1232.38 0.50
# 2 1002 2017-05-22 17 NaN NaN 53.3 1240.21 0.39