R代码将时间戳间隔从1分钟转换为15分钟,其他列值汇总到它们的平均值

时间:2018-01-19 08:01:24

标签: r dataframe timestamp categorical-data

我有一个如下数据框。我想将时间戳从1分钟间隔转换为15分钟(挂钟中断(11:15,11:20等)),所有其他列值都会聚合到它们的平均值。

  

我有大约30列数字和分类变量

。请让我知道如何去做它

数据集:输入

 TS                A            B           C       D
 1/16/2018 2:45   63.5959053    51.3232269  Active  Inactive
 1/16/2018 2:46   65.9080353    51.40625    Active  Inactive
 1/16/2018 2:47   76.05151      51.40625    Active  Inactive
 1/16/2018 2:48   67.03827      51.3642731  Active  Inactive
 1/16/2018 2:49   67.17433      51.26026    Active  Inactive
 1/16/2018 2:50   68.20074      51.21875    Active  Inactive
 1/16/2018 2:51   63.5963936    51.2397346  Active  Inactive
 1/16/2018 2:52   61.12207      51.28125    Active  Inactive
 1/16/2018 2:53   65.24389      51.28125    Active  Inactive
 1/16/2018 2:54   61.8528252    51.28125    Active  Inactive
 1/16/2018 2:55   58.59375      51.28125    Active  Inactive
 1/16/2018 2:56   61.1220169    51.32321    Active  Inactive
 1/16/2018 2:57   63.5968857    51.40625    Active  Inactive
 1/16/2018 2:58   61.12183      51.40625    Active  Inactive
 1/16/2018 2:59   63.59697      51.3642921  Active  Inactive
 1/16/2018 3:00   65.9047       51.28125    Active  Inactive

期望的输出:

    TS              A           B           C       D
    1/16/2018 2:45  64.52102813 51.32291645 Active  Inactive
    1/16/2018 3:00  68.9047     59.28125    Active  Inactive

1 个答案:

答案 0 :(得分:0)

喜欢这个。首先,我重建您的数据框,

df <- data.frame(TS = c("1/16/2018 2:45", "1/16/2018 2:46", "1/16/2018 2:47",
      "1/16/2018 2:48", "1/16/2018 2:49", "1/16/2018 2:50", "1/16/2018 2:51",
      "1/16/2018 2:52", "1/16/2018 2:53", "1/16/2018 2:54", "1/16/2018 2:55",
      "1/16/2018 2:56", "1/16/2018 2:57", "1/16/2018 2:58", "1/16/2018 2:59",
      "1/16/2018 3:00"), 
    A = c(63.5959053, 65.9080353, 76.05151, 67.03827, 67.17433, 68.20074,
      63.5963936, 61.12207, 65.24389, 61.8528252, 58.59375, 61.1220169,
      63.5968857, 61.12183, 63.59697, 65.9047),
    B = c(51.3232269, 51.40625, 51.40625, 51.3642731, 51.26026, 51.21875, 51.2397346, 
      51.28125, 51.28125, 51.28125, 51.28125, 51.32321, 51.40625, 51.40625, 51.3642921,
      51.28125))

现在我正在使用tidyverselubridatedplyr以及padr

中的包
# install.packages(c("padr", "tidyverse"), dependencies = TRUE)
library(tidyverse); library(padr) # library(lubridate)
as_tibble(df)  %>% mutate(TS = mdy_hm(TS)) %>%
        thicken('15 min') %>%
        group_by(TS_15_min, C, D) %>%
        summarise_at(which(sapply(., is.numeric)), mean)
#> # A tibble: 2 x 5
#> # Groups:   TS_15_min, C [?]
#>             TS_15_min      C        D        A        B
#>                <dttm> <fctr>   <fctr>    <dbl>    <dbl>
#> 1 2018-01-16 02:45:00 Active Inactive 64.52103 51.32292
#> 2 2018-01-16 03:00:00 Active Inactive 65.90470 51.28125

如果订单至关重要,您可以使用%>% select(sort(current_vars()))或可能%>% select(noquote(order(colnames(df))))或一直使用,

as_tibble(df)  %>% mutate(TS = mdy_hm(TS)) %>%
        thicken('15 min', colname = '15_min') %>%
        select(-TS, TS = '15_min') %>%
        group_by(TS, C, D) %>%
        summarise_at(which(sapply(., is.numeric)), mean) %>% select(c('TS', LETTERS[1:4]))
#> # A tibble: 2 x 5
#> # Groups:   TS, C [2]
#>                    TS        A        B      C        D
#>                <dttm>    <dbl>    <dbl> <fctr>   <fctr>
#> 1 2018-01-16 02:45:00 64.52103 51.32292 Active Inactive
#> 2 2018-01-16 03:00:00 65.90470 51.28125 Active Inactive

但是我认为,不显示它不再TS,而是TS的间隔,即,

as_tibble(df)  %>% mutate(TS = mdy_hm(TS)) %>%
        thicken('15 min') %>%
        group_by(TS_15_min, C, D) %>%
        summarise_at(which(sapply(., is.numeric)), mean) %>% 
        select('15 min intervals of TS' = TS_15_min, sort(current_vars()))
#> # A tibble: 2 x 5
#> # Groups:   15 min intervals of TS, C [2]
#>   `15 min intervals of TS`        A        B      C        D
#>                     <dttm>    <dbl>    <dbl> <fctr>   <fctr>
#> 1      2018-01-16 02:45:00 64.52103 51.32292 Active Inactive
#> 2      2018-01-16 03:00:00 65.90470 51.28125 Active Inactive