我有这种形式的数据框
familyid memberid occupation panelid year
1 1 1 1 2000
1 2 1 1 2000
2 1 1 1 2000
2 2 2 1 2000
3 1 1 1 2000
3 2 1 1 2000
3 3 1 1 2000
1 1 2 2 2001
1 2 1 2 2001
2 1 2 2 2001
2 2 2 2 2001
3 1 1 2 2001
3 2 2 2 2001
3 3 2 2 2001
我想过滤此数据框以获取以下内容。
familyid memberid occupation panelid year
1 1 1 1 2000
2 1 1 1 2000
3 2 1 1 2000
3 3 1 1 2000
1 1 2 2 2001
2 1 2 2 2001
3 2 2 2 2001
3 3 2 2 2001
换句话说,我只想保留在2000年占职业== 1(panelid == 1)和在2001年占职业== 2(panelid == 2)的面板obs。有人知道怎么做这个吗?非常感谢大家,
马可
答案 0 :(得分:0)
在这里,我们可以根据filter
'职业'1和'year'2000以及any
'职业'2对'familyid','memberid',any
进行分组和“年份” 2001
library(tidyverse)
df1 %>%
group_by(familyid, memberid) %>%
filter(any(occupation == 1 & year == 2000) & any(occupation == 2 & year == 2001))
# A tibble: 8 x 5
# Groups: familyid, memberid [4]
# familyid memberid occupation panelid year
# <int> <int> <int> <int> <int>
#1 1 1 1 1 2000
#2 2 1 1 1 2000
#3 3 2 1 1 2000
#4 3 3 1 1 2000
#5 1 1 2 2 2001
#6 2 1 2 2 2001
#7 3 2 2 2 2001
#8 3 3 2 2 2001
或者,如果“职业”和“年”的水平只有两个,那么我们也可以用n_distinct
来计算filter
的逻辑向量
df1 %>%
group_by(familyid, memberid) %>%
filter(n_distinct(occupation) >1 & n_distinct(year)> 1)
df1 <- structure(list(familyid = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L,
2L, 2L, 3L, 3L, 3L), memberid = c(1L, 2L, 1L, 2L, 1L, 2L, 3L,
1L, 2L, 1L, 2L, 1L, 2L, 3L), occupation = c(1L, 1L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), panelid = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), year = c(2000L,
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L,
2001L, 2001L, 2001L, 2001L)), class = "data.frame", row.names = c(NA,
-14L))