我有一个如下数据框。我想将时间戳从1分钟间隔转换为15分钟(挂钟中断(11:15,11:20等)),所有其他列值都会聚合到它们的平均值。
我有大约30列数字和分类变量
。请让我知道如何去做它
数据集:输入
TS A B C D
1/16/2018 2:45 63.5959053 51.3232269 Active Inactive
1/16/2018 2:46 65.9080353 51.40625 Active Inactive
1/16/2018 2:47 76.05151 51.40625 Active Inactive
1/16/2018 2:48 67.03827 51.3642731 Active Inactive
1/16/2018 2:49 67.17433 51.26026 Active Inactive
1/16/2018 2:50 68.20074 51.21875 Active Inactive
1/16/2018 2:51 63.5963936 51.2397346 Active Inactive
1/16/2018 2:52 61.12207 51.28125 Active Inactive
1/16/2018 2:53 65.24389 51.28125 Active Inactive
1/16/2018 2:54 61.8528252 51.28125 Active Inactive
1/16/2018 2:55 58.59375 51.28125 Active Inactive
1/16/2018 2:56 61.1220169 51.32321 Active Inactive
1/16/2018 2:57 63.5968857 51.40625 Active Inactive
1/16/2018 2:58 61.12183 51.40625 Active Inactive
1/16/2018 2:59 63.59697 51.3642921 Active Inactive
1/16/2018 3:00 65.9047 51.28125 Active Inactive
期望的输出:
TS A B C D
1/16/2018 2:45 64.52102813 51.32291645 Active Inactive
1/16/2018 3:00 68.9047 59.28125 Active Inactive
答案 0 :(得分:0)
喜欢这个。首先,我重建您的数据框,
df <- data.frame(TS = c("1/16/2018 2:45", "1/16/2018 2:46", "1/16/2018 2:47",
"1/16/2018 2:48", "1/16/2018 2:49", "1/16/2018 2:50", "1/16/2018 2:51",
"1/16/2018 2:52", "1/16/2018 2:53", "1/16/2018 2:54", "1/16/2018 2:55",
"1/16/2018 2:56", "1/16/2018 2:57", "1/16/2018 2:58", "1/16/2018 2:59",
"1/16/2018 3:00"),
A = c(63.5959053, 65.9080353, 76.05151, 67.03827, 67.17433, 68.20074,
63.5963936, 61.12207, 65.24389, 61.8528252, 58.59375, 61.1220169,
63.5968857, 61.12183, 63.59697, 65.9047),
B = c(51.3232269, 51.40625, 51.40625, 51.3642731, 51.26026, 51.21875, 51.2397346,
51.28125, 51.28125, 51.28125, 51.28125, 51.32321, 51.40625, 51.40625, 51.3642921,
51.28125))
现在我正在使用tidyverse
,lubridate
和dplyr
以及padr
包
# install.packages(c("padr", "tidyverse"), dependencies = TRUE)
library(tidyverse); library(padr) # library(lubridate)
as_tibble(df) %>% mutate(TS = mdy_hm(TS)) %>%
thicken('15 min') %>%
group_by(TS_15_min, C, D) %>%
summarise_at(which(sapply(., is.numeric)), mean)
#> # A tibble: 2 x 5
#> # Groups: TS_15_min, C [?]
#> TS_15_min C D A B
#> <dttm> <fctr> <fctr> <dbl> <dbl>
#> 1 2018-01-16 02:45:00 Active Inactive 64.52103 51.32292
#> 2 2018-01-16 03:00:00 Active Inactive 65.90470 51.28125
如果订单至关重要,您可以使用%>% select(sort(current_vars()))
或可能%>% select(noquote(order(colnames(df))))
或一直使用,
as_tibble(df) %>% mutate(TS = mdy_hm(TS)) %>%
thicken('15 min', colname = '15_min') %>%
select(-TS, TS = '15_min') %>%
group_by(TS, C, D) %>%
summarise_at(which(sapply(., is.numeric)), mean) %>% select(c('TS', LETTERS[1:4]))
#> # A tibble: 2 x 5
#> # Groups: TS, C [2]
#> TS A B C D
#> <dttm> <dbl> <dbl> <fctr> <fctr>
#> 1 2018-01-16 02:45:00 64.52103 51.32292 Active Inactive
#> 2 2018-01-16 03:00:00 65.90470 51.28125 Active Inactive
但是我认为,不显示它不再TS
,而是TS
的间隔,即,
as_tibble(df) %>% mutate(TS = mdy_hm(TS)) %>%
thicken('15 min') %>%
group_by(TS_15_min, C, D) %>%
summarise_at(which(sapply(., is.numeric)), mean) %>%
select('15 min intervals of TS' = TS_15_min, sort(current_vars()))
#> # A tibble: 2 x 5
#> # Groups: 15 min intervals of TS, C [2]
#> `15 min intervals of TS` A B C D
#> <dttm> <dbl> <dbl> <fctr> <fctr>
#> 1 2018-01-16 02:45:00 64.52103 51.32292 Active Inactive
#> 2 2018-01-16 03:00:00 65.90470 51.28125 Active Inactive