我有以下数据框:
# Example:
_________________________
| id | day | state
-------------------------
[1,] 1 0 0
[2,] 1 1 0
[3,] 1 2 0
[4,] 1 3 1
[5,] 1 4 1
[6,] 1 5 1
[7,] 1 6 0
[8,] 1 7 0
[9,] 1 8 3
[10,] 2 0 0
[11,] 2 1 0
[12,] 2 2 0
[13,] 2 3 1
[14,] 2 4 1
[15,] 2 5 4
[16,] 3 0 0
[17,] 3 1 0
[18,] 3 2 1
[19,] 3 3 0
[20,] 3 4 4
[21,] 4 0 0
[22,] 4 1 1
[23,] 4 2 0
[24,] 4 3 0
[25,] 4 4 0
[26,] 4 5 1
[27,] 4 6 0
[28,] 4 7 3
[29,] 5 0 0
[30,] 5 1 1
[31,] 5 2 1
[32,] 5 3 0
[33,] 5 4 0
[34,] 5 5 4
# Code:
byRow <- TRUE
example.Matrix <- matrix(data = c(1, 0, 0,1, 1, 0,1, 2, 0,1, 3, 1,1, 4, 1,1, 5, 1,1, 6,
0,1, 7, 0,1, 8, 3,2, 0, 0,2, 1, 0, 2, 2, 0, 2, 3, 1,2, 4, 1,2, 5, 4, 3, 0, 0,3,1, 0,3,
2, 1,3, 3, 0,3, 4, 4,4, 0, 0, 4, 1, 1, 4, 2, 0,4, 3, 0,4, 4, 0,4, 5, 1,4, 6, 0,4, 7, 3,
5, 0, 0,5, 1, 1,5, 2, 1, 5, 3, 0, 5, 4, 0,5, 5, 4), byrow=TRUE,ncol=3)
example.df<-as.data.frame(example.Matrix)
colnames(example.df) <- c("id", "day", "states")
我想执行以下操作:
1)创建一个ID为ID的数据框(或矩阵),其状态在状态中唯一值为1,然后在下一行中除1之外的任何内容之后,因此例如,如下所示:
# Expected output for first step:
_______________
|id|day|states|
----------------
3 | 2 | 1 |
3 | 3 | 0 |
3 | 4 | 4 |
----------------
# Example in code:
matrix.1<-matrix(c(3,2,1,3,3,0,3,4,4), byrow=TRUE,ncol=3)
df.1<-as.data.frame(matrix.1)
colnames(df.1) <- c("id", "day", "states")
请注意,尽管在id 4中存在状态从1变为0的情况,但它们会重新输入1,因此id 4不应包含在新的数据帧/矩阵中。
# Should not be included in expected output for df.1:
_______________
|id|day|states|
----------------
4 | 1 | 1 | #* start
4 | 2 | 0 | #* meets condition
4 | 3 | 0 |
4 | 4 | 0 |
4 | 5 | 0 |
4 | 6 | 1 | #*reenters 1 - does not meet condition
4 | 7 | 0 |
4 | 8 | 3 |
---------------
2)然后,一旦构造了数据帧/矩阵,我想从原始数据帧中创建另一个数据帧(例如,使用for循环),但这一次条件是针对处于以下状态的个人:1,后跟1,然后再加上1。但是看起来像这样:
# Expected output from second step:
_______________
|id|day|states|
----------------
2 | 3 | 1 |
2 | 4 | 1 |
2 | 5 | 4 |
5 | 1 | 1 |
5 | 2 | 1 |
5 | 3 | 0 |
5 | 4 | 0 |
5 | 5 | 4 |
----------------
满足条件后,类似的id不应重新输入1
3)之后,我想继续重复这种模式,因此下一个模式适用于以下州的个人:1,其次为1,其次为1,其次为1:
# Expected output from third step:
_______________
|id|day|states|
----------------
1 | 3 | 1 |
1 | 4 | 1 |
1 | 5 | 1 |
1 | 6 | 0 |
1 | 7 | 0 |
1 | 8 | 3 |
----------------
4)然后,我将继续该模式直至29个连续的1s。
所以最终我希望有30个数据框/矩阵,并且每个人都符合上述条件。
答案 0 :(得分:1)
我们创建了一个函数来执行此操作
library(data.table)
library(dplyr)
f1 <- function(data, n){
ids <- data %>%
mutate(stateslead = lead(states, default = last(states))) %>%
group_by(grp = rleid(states == 1)) %>%
filter(n() == n, states == 1, stateslead != 1) %>%
group_by(id) %>%
filter(n() == 1) %>%
pull(id)
data %>%
filter(id %in% ids) %>%
group_by(id) %>%
filter(cumsum(states) > 0)
}
-测试
f1(example.df, 1)
# id day states
#1 3 2 1
#2 3 3 0
#3 3 4 4
f1(example.df, 2)
# A tibble: 8 x 3
# Groups: id [2]
# id day states
# <dbl> <dbl> <dbl>
#1 2 3 1
#2 2 4 1
#3 2 5 4
#4 5 1 1
#5 5 2 1
#6 5 3 0
#7 5 4 0
#8 5 5 4
f1(example.df, 3)
# id day states
#1 1 3 1
#2 1 4 1
#3 1 5 1
#4 1 6 0
#5 1 7 0
#6 1 8 3
此外,如果我们要一步一步完成操作,请使用map
遍历'n'
library(purrr)
out1 <- map(1:3, f1, data = example.df)
对于OP,1:3
可以替换为1:29
。 'out1'是list
s中的tibble/data.frame