使用gre中的grep和子集的多个过滤器

时间:2015-03-12 07:34:29

标签: r grep subset grepl

我尝试创建一个过滤器,使用grep和subset一起从数据集中删除行。

示例数据集:

id <- 1:10
problem <- c("a" , "b", "c", "d", "a","b","c","a", "b", "a")
solution1 <- c("eat", "sleep", "drink", "play", "sleep", "play", "play", "drink", "play", "eat")
solution2 <- c("read", "read", "eat", "drink", "eat", "sleep", "eat", "read", "eat", "play")
df <- c(id, problem, solution1, solution2)

我试图删除那些有问题的行&#34; a&#34;并且吃了#34;在solution1或solution2中。

结果是它应该删除id 1,5和10。

我尝试过使用:

df <- subset(df, problem=="a" & !(grepl("eat", df)))

df <- df[!grepl("eat", df) & grepl("a", df$problem)]

似乎无法在StackOverflow或我用Google搜索的其他网站上找到类似的解决方案。

如果有人能提供帮助,我将不胜感激。谢谢!

2 个答案:

答案 0 :(得分:5)

首先,如果你想要一个数据帧,你应该使用data.frame,而不是c:

df <- data.frame(id, problem, solution1, solution2)

然后您可以像这样进行子集化(不需要使用子集本身)

df2 <- df[!(grepl("a", df$problem) & 
           (grepl("eat", df$solution1) |
            grepl("eat", solution2))),]

#   id problem solution1 solution2
# 2  2       b     sleep      read
# 3  3       c     drink       eat
# 4  4       d      play     drink
# 6  6       b      play     sleep
# 7  7       c      play       eat
# 8  8       a     drink      read
# 9  9       b      play       eat

答案 1 :(得分:0)

我这样做:

df <- df[!(df$problem %in% "a" & (df$solution1 %in% "eat" | df$solution2 %in% "eat")),]

#   id problem solution1 solution2
# 2  2       b     sleep      read
# 3  3       c     drink       eat
# 4  4       d      play     drink
# 6  6       b      play     sleep
# 7  7       c      play       eat
# 8  8       a     drink      read
# 9  9       b      play       eat
如果比较确切的字符串,

正则表达式并不是必需的。使用%in%进行子集化将节省大量时间,因为它会比较向量。例如而不是"a"可能有c("a", "b", "c")等。