根据列的级别进行过滤

时间:2018-02-02 19:03:34

标签: r dplyr

help <- data.frame(id = c(5, 5, 7, 7, 18, 18, 42, 42, 46, 46, 50, 51),
                   grade = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e", "w", "z"),
                   pass = c("yes", "no", "yes", "no", "no", "no", "yes", "no", "yes", "yes", "yes", "no"))

使用帮助数据集,我想:

  • (1)保持ID具有重复等级和是/否通过
  • (2)然后只保留具有“是”等级的行并删除“否” 等级行

希望有一个看起来像这样的数据集:

  id grade pass
   5     a  yes
   7     b  yes
  42     d  yes
  46     e  yes
  46     e  yes

我试图使用......

help %>% group_by(id, grade, pass) %>% filter(pass == "yes" & pass == "no")

但即便如此,因为它会删除所有内容并输出一个空的df。

4 个答案:

答案 0 :(得分:1)

使用基础r解决方案可能是:

help <- data.frame(id = c(5, 5, 7, 7, 18, 18, 42, 42, 46, 46, 50, 51),
    grade = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e", "w", "z"),
    pass = c("yes", "no", "yes", "no", "no", "no", "yes", "no", "yes", "yes", "yes", "no"))

# Keep duplicate Id and grades. The trick is to find duplicate from
# from start and then from last
help2 <- help[duplicated((help[,1:2])) | duplicated(help[,1:2], fromLast = TRUE),]


    # Filter for the pass
   help2[help2$pass == "yes",]

#   id grade pass
#1   5     a  yes
#3   7     b  yes
#7  42     d  yes
#9  46     e  yes
#10 46     e  yes

答案 1 :(得分:1)

我们可以group_by基于idgrade,然后在计数数量大于1且passyes时进行过滤。

library(dplyr)

help %>%
  group_by(id, grade) %>%
  filter(n() > 1, pass %in% "yes") %>%
  ungroup()
# # A tibble: 5 x 3
#      id grade pass 
#   <dbl> <fct> <fct>
# 1  5.00 a     yes  
# 2  7.00 b     yes  
# 3 42.0  d     yes  
# 4 46.0  e     yes  
# 5 46.0  e     yes 

答案 2 :(得分:1)

 subset(help,!duplicated(help)&pass=="yes")
   id grade pass
1   5     a  yes
3   7     b  yes
7  42     d  yes
9  46     e  yes
11 50     w  yes

答案 3 :(得分:0)

所以我加载它:

og_help <- data.frame(id = c(5, 5, 7, 7, 18, 18, 42, 42, 46, 46, 50, 51),
                   grade = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e", "w", "z"),
                   pass = c("yes", "no", "yes", "no", "no", "no", "yes", "no", "yes", "yes", "yes", "no"))

然后我返回一组唯一的行:

help <- unique(og_help)

仅将pass变量设置为yes的那些子集。

help <- help[ which(help$pass == "yes"), ]

这输出以下内容:

   id grade pass
1   5     a  yes
3   7     b  yes
7  42     d  yes
9  46     e  yes
11 50     w  yes