我能够用sqldf弄明白,但我希望能够在纯R中获得相同的结果。
数据:
df <- read.table(header=T, text = "year1 year2 year3 year4 signup_date
B U C D 4/10/12
C D B U 2/12/12
U C D U 3/14/05
B NA NA NA 3/7/05
NA NA NA NA 8/3/08
A NA NA NA 4/6/07")
我的sqldf查询:
df <- sqldf("
SELECT *
FROM data
WHERE year1 NOT IN ('B','C','D','U')
AND year2 NOT IN ('B','C','D','U')
AND year3 NOT IN ('B','C','D','U')
AND year4 NOT IN ('B','C','D','U')
ORDER BY signup_date DESC")
期望的结果:
year1 year2 year3 year4 signup_date
8/3/08
A 4/6/07
答案 0 :(得分:2)
另一种选择是使用dplyr包:
library(dplyr)
filterVars <- c("B","C","D","U")
df %>%
filter(!year1 %in% filterVars, !year2 %in% filterVars, !year3 %in% filterVars, !year4 %in% filterVars) %>%
arrange(desc(signup_date))
收率:
year1 year2 year3 year4 signup_date
1 <NA> <NA> <NA> <NA> 8/3/08
2 A <NA> <NA> <NA> 4/6/07
答案 1 :(得分:1)
尝试
fvars <- c('B', 'C', 'D', 'U')
df2 <- df1[Reduce(`&`,lapply(df1[paste0('year',1:4)],
function(x) !x %in% fvars)),]
df2
# year1 year2 year3 year4 signup_date
#5 8/3/08
#6 A 4/6/07
或使用data.table
library(data.table)
nm1 <- grep('year', names(df1))
setDT(df1)[df1[, Reduce(`&`,lapply(.SD, function(x) !x %chin%
fvars)) , .SDcols=nm1]][order(-signup_date)]
# year1 year2 year3 year4 signup_date
#1: 8/3/08
#2: A 4/6/07
注意:在转换为“日期”课程后订购'signup_date'可能更好。即。 as.Date(df1$signup_date, '%m/%d/%y')
df1 <- structure(list(year1 = c("B", "C", "U", "B", "", "A"),
year2 = c("U",
"D", "C", "", "", ""), year3 = c("C", "B", "D", "", "", ""),
year4 = c("D", "U", "U", "", "", ""), signup_date = c("4/10/12",
"2/12/12", "3/14/05", "3/7/05", "8/3/08", "4/6/07")),
.Names = c("year1",
"year2", "year3", "year4", "signup_date"), class = "data.frame",
row.names = c(NA, -6L))