我正在寻找一种解决方案,用于在特定列中满足特定条件后对总行数求和。
简化的示例数据:
rbind( c('Group A', "eventcode.1", "10:00"),
c('Group A', "eventcode.3", "09:59"),
c('Group B', "eventcode.4", "09:57"),
c('Group A', "eventcode.6", "09:56"),
c('Group B', "eventcode.4", "09:52"),
c('Group A', "eventcode.4", "09:51"),
c('Group A', "eventcode.9", "09:48"),
c('Group A', "eventcode.1", "09:46"),
c('Group A', "eventcode.3", "09:45"),
c('Group B', "eventcode.4", "09:41"),
c('Group B', "eventcode.8", "09:40"),
c('Group B', "eventcode.4", "09:37"),
c('Group B', "eventcode.1", "09:33"),
c('Group B', "eventcode.2", "09:31"),
c('Group B', "eventcode.3", "09:30"),
c('Group A', "eventcode.5", "09:28"),
c('Group A', "eventcode.6", "09:28"),
c('Group B', "eventcode.7", "09:27"),
c('Group B', "eventcode.2", "09:26"),
c('Group A', "eventcode.9", "09:26"),
c('Group B', "eventcode.11", "09:24"),
c('Group A', "eventcode.7", "09:20"),
c('Group A', "eventcode.1", "09:17"),
c('Group A', "eventcode.2", "09:15"),
c('Group B', "eventcode.4", "09:12"),
c('Group B', "eventcode.4", "09:08")) %>%
as.data.frame() -> temp.data
colnames(temp.data) = c('Group', 'Event', "Time")
这是一组减少的数据(原始的将有40多列与事件相关的数据),但是重要的是要知道每个数据集中只有两种类型的组,并且这些组记录了动作为其分配了特定代码的事件。每当触发特定事件代码时,只要组之间没有中断,我想标识该行并创建一个新变量,该变量汇总导致该事件的行数(按A / B组分组)以及正在采取的行动。触发它的事件代码是“ eventcode.1”。然后,在前面的那些行中,我想对完成记录的事件的组的特定事件代码(eventcode.4)的发生和导致事件代码1的事件的总时间进行求和。
即
row 1 - Group A - would have a value of 0, eventcode.4 count of 0, and time count of 0 seconds
row 8 - Group A - would have a value of 2, eventcode.4 count of 1, and time count of 5 seconds
row 13 - Group B - would have a value of 3, eventcode.4 count of 2, and time count of 8 seconds
row 23 - Group A - would have a value of 1, , eventcode.4 count of 0, and time count of 3 seconds
答案 0 :(得分:1)
一种方法(使用dplyr
和lubridate
进行时间转换)
temp.data %>%
mutate(rn = row_number()) %>%
mutate(brk1 = lag(V2, 1) == 'eventcode.1',
brk2 = lag(V1, 1) != V1
) %>%
mutate(grp = cumsum(
(1L * coalesce(brk1, F)) +
(1L * coalesce(brk2, F)))
) %>%
group_by(grp) %>%
filter(last(V2) == 'eventcode.1') %>%
summarize(
row = last(rn),
group = first(V1),
value = n() - 1,
cnt = sum(if_else(V2 == 'eventcode.4', 1, 0)),
tmct = seconds(ms(first(V3))) - seconds(ms(last(V3)))
) %>%
select(-grp);
哪个会产生:
# A tibble: 4 x 5
row group value cnt tmct
<int> <fct> <dbl> <dbl> <Period>
1 1 Group A 0 0 0S
2 8 Group A 2 1 5S
3 13 Group B 3 2 8S
4 23 Group A 1 0 3S