从分类变量中删除值

时间:2017-07-18 17:33:09

标签: r

我有一个如下所示的数据框:

  summary(imputedWork)
              everwrk          age_p      
 1 Yes            :27918   Min.   :18.00  
 2 No             : 5034   1st Qu.:33.00  
 7 Refused        :   45   Median :47.00  
 8 Not ascertained:    0   Mean   :48.11  
 9 Don't know     :   17   3rd Qu.:62.00  
                           Max.   :85.00  

                            r_maritl    
 1 Married - spouse in household:13943  
 7 Never married                : 7763  
 5 Divorced                     : 4511  
 4 Widowed                      : 3069  
 8 Living with partner          : 2002  
 6 Separated                    : 1121  
 (Other)                        :  605 

我想删除everwrk中的“拒绝”,“不知道”和“未确定”值以及r_maritl中的“(其他)”值。

2 个答案:

答案 0 :(得分:1)

当与您不需要的值匹配时,这将删除该行

 A=c("Refused","Don't Know", "Not ascertained")
 B=c("Married - spouse in household",
    "Never married","Divorced","Widowed","Living with partner","Separated")
 imputedWork[!imputedWork$everwrk %in% A & imputedWork$r_maritl %in% B,]   

答案 1 :(得分:0)

dplyr解决方案:

imputedWork <- imputedWork %>% 
  filter(
    (everwrk=="Yes" | everwrk=="No") & r_maritl!="(Other)"
  )

如果everwrkr_maritlfactor,您还要删除这些级别:

imputedWork <- imputedWork %>% 
  filter(
    (everwrk=="Yes" | everwrk=="No") & r_maritl!="(Other)"
  ) %>% 
  mutate(everwork=droplevels(everwrk),
         r_maritl=droplevels(r_maritl))