如何按R中的连续行分组

时间:2017-06-16 13:17:37

标签: r dplyr row grouping

以下是我的一些数据的示例行:

A   B    participant   trial   CURRENT_ID      C
0   1    ppt01         45      3               0   #row1
1   0    ppt01         45      4               0   #row2
0   1    ppt01         45      10              0   #row3
0   0    ppt01         45      11              0   #row4
1   0    ppt01         45      12              0   #row5
0   1    ppt01         87      2               0   #row6
1   0    ppt01         87      3               0   #row7
1   1    ppt01         87      4               1   #row8
1   1    ppt01         87      5               1   #row9
0   1    ppt02         55      5               0   #row10
1   0    ppt02         55      6               0   #row11
0   1    ppt02         55      9               0   #row12
1   0    ppt02         55      10              0   #row13
0   1    ppt02         55      11              1   #row14
1   0    ppt02         55      12              0   #row15

我需要按参与者,试用和连续的CURRENT_ID行对数据进行分组。但是,参与者和试验需要考虑连续的CURRENT_ID行,可能需要考虑两次。这里是我需要考虑连续行的示例。如您所见,某些行需要考虑两次(例如,参与者ppt01,试验45,CURRENT_ID 11),以及前一行和后一行:

A   B    participant   trial   CURRENT_ID      C
0   1    ppt01         45      3               0   #row1
1   0    ppt01         45      4               0   #row2

0   1    ppt01         45      10              0   #row3
0   0    ppt01         45      11              0   #row4

0   0    ppt01         45      11              0   #row4
1   0    ppt01         45      12              0   #row5

0   1    ppt01         87      2               0   #row6
1   0    ppt01         87      3               0   #row7

1   0    ppt01         87      3               0   #row7
1   1    ppt01         87      4               1   #row8

1   1    ppt01         87      4               1   #row8
1   1    ppt01         87      5               1   #row9

0   1    ppt02         55      5               0   #row10
1   0    ppt02         55      6               0   #row11

0   1    ppt02         55      9               0   #row12
1   0    ppt02         55      10              0   #row13

1   0    ppt02         55      10              0   #row13
0   1    ppt02         55      11              1   #row14

0   1    ppt02         55      11              1   #row14
1   0    ppt02         55      12              0   #row15

如何在library(dplyr) group_by(participant,trial)中包含CURRENT_ID的连续行?

1 个答案:

答案 0 :(得分:0)

不知道如何使用dplyr,但这是基础R中的方法:

# data
dat <- structure(list(A = c(0L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 
1L, 0L, 1L, 0L, 1L), B = c(1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 
1L, 0L, 1L, 0L, 1L, 0L), participant = c("ppt01", "ppt01", "ppt01", 
"ppt01", "ppt01", "ppt01", "ppt01", "ppt01", "ppt01", "ppt02", 
"ppt02", "ppt02", "ppt02", "ppt02", "ppt02"), trial = c(45L, 
45L, 45L, 45L, 45L, 87L, 87L, 87L, 87L, 55L, 55L, 55L, 55L, 55L, 
55L), CURRENT_ID = c(3L, 4L, 10L, 11L, 12L, 2L, 3L, 4L, 5L, 5L, 
6L, 9L, 10L, 11L, 12L), C = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
1L, 0L, 0L, 0L, 0L, 1L, 0L)), .Names = c("A", "B", "participant", 
"trial", "CURRENT_ID", "C"), row.names = c(NA, -15L), class = "data.frame")

# where can the consecutives start? Only look at those with same trial/participant
idx <- which(diff(dat[,"CURRENT_ID"])==1)
idx <- Filter(function(i) dat[i,"trial"]==dat[i+1,"trial"], idx)
idx <- Filter(function(i) dat[i,"participant"]==dat[i+1,"participant"], idx)

# create the dataframes
lapply(idx, function(i) dat[c(i,i+1),])