选择具有模式的ID行,而不会丢失其他行

时间:2019-03-30 15:21:10

标签: r grouping unique rows

我有以下数据框:

# Example:
_________________________
     | id  | day  | state
-------------------------

 [1,]    1    0    0
 [2,]    1    1    0
 [3,]    1    2    0
 [4,]    1    3    1
 [5,]    1    4    1
 [6,]    1    5    1
 [7,]    1    6    0
 [8,]    1    7    0
 [9,]    1    8    3
[10,]    2    0    0
[11,]    2    1    0
[12,]    2    2    0
[13,]    2    3    1
[14,]    2    4    1
[15,]    2    5    4
[16,]    3    0    0
[17,]    3    1    0
[18,]    3    2    1
[19,]    3    3    0
[20,]    3    4    4
[21,]    4    0    0
[22,]    4    1    1
[23,]    4    2    0
[24,]    4    3    0
[25,]    4    4    0
[26,]    4    5    1
[27,]    4    6    0
[28,]    4    7    3
[29,]    5    0    0
[30,]    5    1    1
[31,]    5    2    1
[32,]    5    3    0
[33,]    5    4    0
[34,]    5    5    4

# Code:
byRow <- TRUE

example.Matrix <- matrix(data = c(1, 0, 0,1, 1, 0,1, 2, 0,1, 3, 1,1, 4, 1,1, 5, 1,1, 6, 
0,1, 7, 0,1, 8, 3,2, 0, 0,2, 1, 0, 2, 2, 0, 2, 3, 1,2, 4, 1,2, 5, 4, 3, 0, 0,3,1, 0,3, 
2, 1,3, 3, 0,3, 4, 4,4, 0, 0, 4, 1, 1, 4, 2, 0,4, 3, 0,4, 4, 0,4, 5, 1,4, 6, 0,4, 7, 3,
5, 0, 0,5, 1, 1,5, 2, 1, 5, 3, 0, 5, 4, 0,5, 5, 4), byrow=TRUE,ncol=3)

example.df<-as.data.frame(example.Matrix)

colnames(example.df) <- c("id", "day", "states")

我想执行以下操作:

1)创建一个ID为ID的数据框(或矩阵),其状态在状态中唯一值为1,然后在下一行中除1之外的任何内容之后,因此例如,如下所示:

# Expected output for first step:
_______________
|id|day|states|
----------------
3  | 2 |   1  |
3  | 3 |   0  |  
3  | 4 |   4  |  
----------------

# Example in code:
matrix.1<-matrix(c(3,2,1,3,3,0,3,4,4), byrow=TRUE,ncol=3)
df.1<-as.data.frame(matrix.1)
colnames(df.1) <- c("id", "day", "states")

请注意,尽管在id 4中存在状态从1变为0的情况,但它们会重新输入1,因此id 4不应包含在新的数据帧/矩阵中。

# Should not be included in expected output for df.1:
_______________
|id|day|states|
----------------
4  | 1 |   1  | #* start 
4  | 2 |   0  | #* meets condition
4  | 3 |   0  | 
4  | 4 |   0  | 
4  | 5 |   0  | 
4  | 6 |   1  | #*reenters 1 - does not meet condition
4  | 7 |   0  | 
4  | 8 |   3  | 
---------------

2)然后,一旦构造了数据帧/矩阵,我想从原始数据帧中创建另一个数据帧(例如,使用for循环),但这一次条件是针对处于以下状态的个人:1,后跟1,然后再加上1。但是看起来像这样:

# Expected  output from second step:
_______________
|id|day|states|
----------------
2  | 3 |   1  |
2  | 4 |   1  |  
2  | 5 |   4  |  
5  | 1 |   1  |
5  | 2 |   1  |  
5  | 3 |   0  |  
5  | 4 |   0  |
5  | 5 |   4  |    
----------------

满足条件后,类似的id不应重新输入1

3)之后,我想继续重复这种模式,因此下一个模式适用于以下州的个人:1,其次为1,其次为1,其次为1:

# Expected output from third step:
_______________
|id|day|states|
----------------
1  | 3 |   1  |
1  | 4 |   1  |  
1  | 5 |   1  |  
1  | 6 |   0  |
1  | 7 |   0  |  
1  | 8 |   3  |   
----------------

4)然后,我将继续该模式直至29个连续的1s。

所以最终我希望有30个数据框/矩阵,并且每个人都符合上述条件。

1 个答案:

答案 0 :(得分:1)

我们创建了一个函数来执行此操作

library(data.table)
library(dplyr)
f1 <- function(data, n){
    ids <- data %>%
             mutate(stateslead = lead(states, default = last(states))) %>%
             group_by(grp = rleid(states == 1)) %>% 
             filter(n() == n, states == 1, stateslead != 1) %>%     
             group_by(id) %>%     
             filter(n() == 1) %>%
             pull(id)

    data %>%
       filter(id %in% ids) %>%
       group_by(id) %>% 
       filter(cumsum(states) > 0)



 }

-测试

f1(example.df, 1)
#  id day states
#1  3   2      1
#2  3   3      0
#3  3   4      4

f1(example.df, 2)
# A tibble: 8 x 3
# Groups:   id [2]
#     id   day states
#  <dbl> <dbl>  <dbl>
#1     2     3      1
#2     2     4      1
#3     2     5      4
#4     5     1      1
#5     5     2      1
#6     5     3      0
#7     5     4      0
#8     5     5      4

f1(example.df, 3)
#  id day states
#1  1   3      1
#2  1   4      1
#3  1   5      1
#4  1   6      0
#5  1   7      0
#6  1   8      3

此外,如果我们要一步一步完成操作,请使用map遍历'n'

library(purrr)
out1 <- map(1:3, f1, data = example.df)

对于OP,1:3可以替换为1:29。 'out1'是list s中的tibble/data.frame