按每天事件序列聚合数据框

时间:2016-09-19 19:38:53

标签: r

我有一个像这样的数据框(df):

 TIMESTAMP               STATUS
 2016-01-01 00:00:00      OFF
 2016-01-01 01:00:00      ON
 2016-01-01 02:00:00      ON
 2016-01-01 03:00:00      OFF
 2016-01-02 00:00:00      ON
 2016-01-02 01:00:00      OFF
 ...

我需要聚合(?)每天的状态序列。例如,df中的第一天给出序列OFF-ON-ON-OFF,而第二天给出OFF-ON

所以我需要一个按日期汇总的数据框:

DAY           SEQUENCE 
2016-01-01    OFF-ON-ON-OFF
2016-01-02    ON-OFF
...

3 个答案:

答案 0 :(得分:1)

library(dplyr)

df %>%
  arrange(TIMESTAMP) %>%
  mutate(date = as.Date(TIMESTAMP)) %>%
  group_by(date) %>%
  summarise(sequence = paste(status, collapse = "-"))

数据

df <- data.frame(
  TIMESTAMP = c("2016-01-01 00:00:00", "2016-01-01 01:00:00", "2016-01-01 02:00:00", "2016-01-01 03:00:00", "2016-01-02 00:00:00", "2016-01-02 01:00:00"),
  status = c("OFF", "ON", "ON", "OFF", "ON", "OFF")
)

答案 1 :(得分:1)

按照传统,我会在这里添加一个data.table解决方案:

library(data.table)
library(lubridate)

s <- "TIMESTAMP, STATUS
2016-01-01 00:00:00, OFF
2016-01-01 01:00:00, ON
2016-01-01 02:00:00, ON
2016-01-01 03:00:00, OFF
2016-01-02 00:00:00, ON
2016-01-02 01:00:00, OFF"

dt <- fread(s)
dt[, day_time := ymd_hms(TIMESTAMP)]
# better to make sure the events is in right order
setorder(dt, day_time)
dt[, DAY := date(day_time)]
dt[, paste0(STATUS, collapse = "-"), by = DAY]

答案 2 :(得分:0)

根据您想要的结果,我假设您也想要删除时间戳。如果是这种情况,您可以使用aggregate,as.Date和基础R的粘贴。

df <- data.frame(TIMESTAMP = 
    c('2016-01-01 00:00:00','2016-01-01 01:00:00',
      '2016-01-01 02:00:00','2016-01-01 03:00:00',
      '2016-01-02 00:00:00','2016-01-02 01:00:00'), 
  STATUS = c('OFF','ON','ON','OFF','ON','OFF'))

aggregate(df$STATUS, list(as.Date(df$TIMESTAMP)), paste, collapse="-")
## Group.1             x
## 2016-01-01 OFF-ON-ON-OFF
## 2016-01-02        ON-OFF