按时差压缩多行

时间:2019-01-28 21:50:07

标签: r dplyr tidyr collapse

假设这些是数据集中带有时间戳的观察值。

 Id     Status    DateCreated          Group
 10     Read      2017-11-04 18:24:55  Red
 10     Write     2017-11-04 18:24:56  Red
 10     Review    2017-11-04 18:25:16  Red
 10     Read      2017-11-04 18:26:17  Red
 10     Write     2017-11-04 18:26:47  Red

如何折叠彼此在1分钟之内的行。例如,第1,2,3行折叠为1行,第4和5行折叠为第二行。

期望的输出看起来像这样

 Id     Status              DateCreated            Date Ended             Group
 10     Read,Write,Review   2017-11-04 18:24:55    2017-11-04 18:25:16    Red, Red, Red
 10     Read,Write          2017-11-04 18:26:17    2017-11-04 18:26:47    Red, Red

下面是在此示例中用于重现测试数据集的代码。

df <- structure(list(Id = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "10", class = "factor"), 
    Status = structure(c(1L, 3L, 2L, 1L, 3L), .Label = c("Read", 
    "Review", "Write"), class = "factor"), DateCreated = structure(1:5, .Label = c("2017-11-04 18:24:55", 
    "2017-11-04 18:24:56", "2017-11-04 18:25:16", "2017-11-04 18:26:17", 
    "2017-11-04 18:26:47"), class = "factor"), Group = structure(c(1L, 
    1L, 1L, 1L, 1L), .Label = "Red", class = "factor")), class = "data.frame", row.names = c(NA, 
-5L))

非常感谢您的帮助。预先感谢。

2 个答案:

答案 0 :(得分:0)

我会做这样的事情:

df %>%
  mutate(DateCreated = ymd_hms(DateCreated))%>%
  group_by(minute(DateCreated))%>%
  arrange(DateCreated)%>%
  summarise(Status = paste(Status,collapse = ", "),DateCreated = DateCreated[1],Date_ended = last(DateCreated),Group = paste(Group,collapse = ", "))

答案 1 :(得分:0)


library(lubridate)
library(dplyr)
library(purrr)

df <-
  structure(
    list(
      Id = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "10", class = "factor"),
      Status = structure(
        c(1L, 3L, 2L, 1L, 3L),
        .Label = c("Read",
                   "Review", "Write"),
        class = "factor"
      ),
      DateCreated = structure(
        1:5,
        .Label = c(
          "2017-11-04 18:24:55",
          "2017-11-04 18:24:56",
          "2017-11-04 18:25:16",
          "2017-11-04 18:26:17",
          "2017-11-04 18:26:47"
        ),
        class = "factor"
      ),
      Group = structure(c(1L,
                          1L, 1L, 1L, 1L), .Label = "Red", class = "factor")
    ),
    class = "data.frame",
    row.names = c(NA,-5L)
  )


df2 <-
  df %>%
  mutate(DateCreated = as_datetime(df$DateCreated)) %>%
  arrange(DateCreated) %>%
  mutate(diff = DateCreated - lag(DateCreated))

df2$diff[1] <- 0L

g <- 0
df3 <- mutate(df2, date_groups =
                accumulate(df2$diff, function(x, y)
                  if (y - x < 60)
                    g
                  else {
                    g <<- g + 1
                  })) %>%
  group_by(date_groups) %>%
  summarise(
    Status = paste(Status, collapse = ", "),
    DateCreated = DateCreated[1],
    Date_ended = last(DateCreated),
    Group = paste(Group, collapse = ", ")
  )

df3
#> # A tibble: 2 x 5
#>   date_groups Status       DateCreated         Date_ended          Group   
#>         <dbl> <chr>        <dttm>              <dttm>              <chr>   
#> 1           0 Read, Write… 2017-11-04 18:24:55 2017-11-04 18:24:55 Red, Re…
#> 2           1 Read, Write  2017-11-04 18:26:17 2017-11-04 18:26:17 Red, Red

reprex package(v0.2.1)于2019-01-28创建