我的数据集类似于以下数据集,但比以下数据集复杂得多:
df<-data.frame(ID = c(1,1,2,2,3,3,3),
week = c(20,21,10,15,20,21,22),
var1 = c(0,1,0,1,0,0,1))
ID week var1
1 1 20 0
2 1 21 1
3 2 10 0
4 2 15 1
5 3 20 0
6 3 21 0
7 3 22 1
我想创建一个新的数据框,该框将保留var1 = 1的所有行,如果ID相同并且一周比包含的行少一整点,则保留前一行。新的数据框如下所示:
ID week var1
1 1 20 0
2 1 21 1
3 2 15 1
4 3 21 0
5 3 22 1
我已经尝试了
df1<-df[which(df$var1 == 1) - 1, ]
但是,无论是否满足我的条件,这都会为我提供上一行。
我也尝试过dplyr的延迟
df2<-filter(df, var1==1 & lag(week)==week-1)
但是,这只给我满足这两个条件的行。我搜索的所有代码均会在其中一个或另一个结果中产生结果。
答案 0 :(得分:0)
您可以依次处理每个条件:
对于您的数据框:
df<-data.frame(ID = c(1,1,2,2,3,3,3),
week = c(20,21,10,15,20,21,22),
var1 = c(0,1,0,1,0,0,1))
您要选择以下内容
# ID week var1
# 1 1 20 0 # <- condition 2 + condition 3
# 2 1 21 1 # <- condition 1
# 3 2 10 0 # <- condition 2
# 4 2 15 1 # <- condition 1
# 5 3 20 0 #
# 6 3 21 0 # <- condition 2 + condition 3
# 7 3 22 1 # <- condition 1
并仅选择条件1和条件2 + 3的行:
## Condition 1: Selecting the rows with var1 = 1
rows_var1 <- which(df$var1 == 1)
rows_var1
# [1] 2 4 7
## Condition 2: Selecting all the previous rows with the same ID
same_ID <- (rows_var1 - 1)[(df$ID[rows_var1] == df$ID[rows_var1 - 1])]
same_ID
# [1] 1 3 6
## Condition 3: Selecting the same IDs with that equal to week-1
same_ID_week <- same_ID[df$week[same_ID] == (df$week[rows_var1] - 1)]
same_ID_week
# [1] 1 6
## Getting the table subset
df1 <- df[sort(c(rows_var1, same_ID_week)),]
# ID week var1
# 1 1 20 0
# 2 1 21 1
# 3 2 15 1
# 4 3 21 0
# 5 3 22 1
答案 1 :(得分:0)
使用SQL,我们可以:
library(sqldf)
sqldf("select b.* from df a join df b on a.ID = b.ID and b.week = a.week - 1
where a.var1 = 1
union
select * from df
where var1 = 1
order by ID, week")
给予
ID week var1
1 1 20 0
2 1 21 1
3 2 15 1
4 3 21 0
5 3 22 1