R中满足条件后的前几行之和

时间:2019-11-14 00:02:15

标签: r dplyr

我正在寻找一种解决方案,用于在特定列中满足特定条件后对总行数求和。

简化的示例数据:

  rbind( c('Group A', "eventcode.1", "10:00"),
                 c('Group A', "eventcode.3", "09:59"),
                 c('Group B', "eventcode.4", "09:57"),
                 c('Group A', "eventcode.6", "09:56"),
                 c('Group B', "eventcode.4", "09:52"),
                 c('Group A', "eventcode.4", "09:51"),
                 c('Group A', "eventcode.9", "09:48"),
                 c('Group A', "eventcode.1", "09:46"),
                 c('Group A', "eventcode.3", "09:45"),
                 c('Group B', "eventcode.4", "09:41"),
                 c('Group B', "eventcode.8", "09:40"),
                 c('Group B', "eventcode.4", "09:37"),
                 c('Group B', "eventcode.1", "09:33"),
                 c('Group B', "eventcode.2", "09:31"),
                 c('Group B', "eventcode.3", "09:30"),
                 c('Group A', "eventcode.5", "09:28"),
                 c('Group A', "eventcode.6", "09:28"),
                 c('Group B', "eventcode.7", "09:27"),
                 c('Group B', "eventcode.2", "09:26"),
                 c('Group A', "eventcode.9", "09:26"),
                 c('Group B', "eventcode.11", "09:24"),
                 c('Group A', "eventcode.7", "09:20"),
                 c('Group A', "eventcode.1", "09:17"),
                 c('Group A', "eventcode.2", "09:15"),
                 c('Group B', "eventcode.4", "09:12"),
                 c('Group B', "eventcode.4", "09:08")) %>%
  as.data.frame() -> temp.data 

colnames(temp.data) = c('Group', 'Event', "Time")

这是一组减少的数据(原始的将有40多列与事件相关的数据),但是重要的是要知道每个数据集中只有两种类型的组,并且这些组记录了动作为其分配了特定代码的事件。每当触发特定事件代码时,只要组之间没有中断,我想标识该行并创建一个新变量,该变量汇总导致该事件的行数(按A / B组分组)以及正在采取的行动。触发它的事件代码是“ eventcode.1”。然后,在前面的那些行中,我想对完成记录的事件的组的特定事件代码(eventcode.4)的发生和导致事件代码1的事件的总时间进行求和。

row 1 - Group A - would have a value of 0, eventcode.4 count of 0, and time count of 0 seconds
row 8 - Group A - would have a value of 2, eventcode.4 count of 1, and time count of 5 seconds
row 13 - Group B - would have a value of 3, eventcode.4 count of 2, and time count of 8 seconds
row 23 - Group A - would have a value of 1, , eventcode.4 count of 0, and time count of 3 seconds

1 个答案:

答案 0 :(得分:1)

一种方法(使用dplyrlubridate进行时间转换)

temp.data %>%
   mutate(rn = row_number()) %>%
   mutate(brk1 = lag(V2, 1) == 'eventcode.1',
          brk2 = lag(V1, 1) != V1
         ) %>%
   mutate(grp = cumsum(
                   (1L * coalesce(brk1, F)) + 
                   (1L * coalesce(brk2, F)))
          ) %>%
   group_by(grp) %>%
   filter(last(V2) == 'eventcode.1') %>%
   summarize(
      row = last(rn),
      group = first(V1),
      value = n() - 1,
      cnt = sum(if_else(V2 == 'eventcode.4', 1, 0)),
      tmct = seconds(ms(first(V3))) - seconds(ms(last(V3)))
   ) %>%
   select(-grp);

哪个会产生:

# A tibble: 4 x 5
    row group   value   cnt tmct    
  <int> <fct>   <dbl> <dbl> <Period>
1     1 Group A     0     0 0S      
2     8 Group A     2     1 5S      
3    13 Group B     3     2 8S      
4    23 Group A     1     0 3S