在条件下在数据框中保留某些行

时间:2019-05-06 06:33:59

标签: r if-statement dplyr

我在R中有一个数据框,如果要匹配某些条件,我想针对该数据框删除某些行。我该怎么办?

我尝试使用dplyrifelse,但是我的代码未给出正确答案

check8 <- distinct(df5,prod,.keep_all = TRUE)

不起作用!给出整个数据集

输入为:

check1 <- data.frame(ID = c(1,1,2,2,2,3,4), 
                     prod = c("R","T","R","T",NA,"T","R"), 
                     bad = c(0,0,0,1,0,1,0))
  #     ID prod bad
#    1  1    R   0
#    2  1    T   0
#    3  2    R   0
#    4  2    T   1
#    5  2 <NA>   0
#    6  3    T   1
#    7  4    R   0

预期输出:

data.frame(ID = c(1,2,3,4), 
           prod = c("R","R","T","R"), 
           bad = c(0,0,1,0))


    #  ID prod bad
   # 1  1    R   0
   # 2  2    R   0
   # 3  3    T   1
   # 4  4    R   0

我想要具有这样的输出,以便对于同时存在prod或NA的ID,仅保留带有prod R的行,但是如果仅存在一个prod,则尽管存在prod,也请保留该行

2 个答案:

答案 0 :(得分:2)

我们可以使用dplyr选择filter所在的行,或者如果组中只有一行,则选择该行。

prod == "R"

答案 1 :(得分:2)

这里使用anti_join

library(dplyr)

check1 <- data.frame(ID = c(1,1,2,2,2,3,4), prod = c("R","T","R","T",NA,"T","R"), bad = c(0,0,0,1,0,1,0))

# First part: select all the IDs which contain 'R' as prod

p1 <- check1 %>% 
  group_by(ID) %>% 
  filter(prod == 'R')

# Second part: using anti_join get all the rows from check1 where there are not 
# matching values in p1

p2 <- anti_join(check1, p1, by = 'ID')

solution <- bind_rows(
  p1, 
  p2
) %>% 
  arrange(ID)