R - 如何在不使用sqldf的情况下从data.frame中的多个列中删除行?

时间:2015-03-30 19:31:01

标签: r dataframe

我能够用sqldf弄明白,但我希望能够在纯R中获得相同的结果。

数据:

df <- read.table(header=T, text = "year1 year2 year3 year4 signup_date 
                 B      U      C         D      4/10/12 
                 C      D      B         U      2/12/12 
                 U      C      D         U      3/14/05 
                 B      NA     NA        NA     3/7/05 
                 NA     NA     NA        NA     8/3/08 
                 A      NA     NA        NA     4/6/07")

我的sqldf查询:

df <- sqldf("
SELECT *
FROM data
WHERE year1 NOT IN ('B','C','D','U')
AND year2 NOT IN ('B','C','D','U')
AND year3 NOT IN ('B','C','D','U')
AND year4 NOT IN ('B','C','D','U')
ORDER BY signup_date DESC")

期望的结果:

    year1 year2 year3 year4 signup_date
                            8/3/08   
    A                       4/6/07 

2 个答案:

答案 0 :(得分:2)

另一种选择是使用dplyr包:

library(dplyr)
filterVars <- c("B","C","D","U")
df %>% 
  filter(!year1 %in% filterVars, !year2 %in% filterVars, !year3 %in% filterVars, !year4 %in% filterVars) %>%
  arrange(desc(signup_date))

收率:

  year1 year2 year3 year4 signup_date
1  <NA>  <NA>  <NA>  <NA>      8/3/08
2     A  <NA>  <NA>  <NA>      4/6/07

答案 1 :(得分:1)

尝试

fvars <- c('B', 'C', 'D', 'U')
df2 <- df1[Reduce(`&`,lapply(df1[paste0('year',1:4)], 
           function(x) !x %in% fvars)),]
df2
#   year1 year2 year3 year4 signup_date
#5                              8/3/08
#6     A                        4/6/07

或使用data.table

library(data.table)
nm1 <- grep('year', names(df1))
setDT(df1)[df1[, Reduce(`&`,lapply(.SD, function(x) !x %chin% 
        fvars)) , .SDcols=nm1]][order(-signup_date)]
#   year1 year2 year3 year4 signup_date
#1:                              8/3/08
#2:     A                        4/6/07

注意:在转换为“日期”课程后订购'signup_date'可能更好。即。 as.Date(df1$signup_date, '%m/%d/%y')

数据

df1 <- structure(list(year1 = c("B", "C", "U", "B", "", "A"),
year2 = c("U", 
"D", "C", "", "", ""), year3 = c("C", "B", "D", "", "", ""), 
year4 = c("D", "U", "U", "", "", ""), signup_date = c("4/10/12", 
"2/12/12", "3/14/05", "3/7/05", "8/3/08", "4/6/07")),
.Names =   c("year1", 
"year2", "year3", "year4", "signup_date"), class = "data.frame", 
row.names = c(NA, -6L))