在R中grepl以查找与任何字符串列表的匹配

时间:2014-08-19 19:57:32

标签: r regex grep grepl

在引用值列表时是否可以使用 grepl 参数,可能使用%in%运算符?我想采取下面的数据,如果动物名称有"狗"或者" cat"在其中,我想要返回一定的值,比如," keep&#34 ;;如果它没有" dog"或者" cat",我想返回"丢弃"。

data <- data.frame(animal = sample(c("cat","dog","bird", 'doggy','kittycat'), 50, replace = T))

现在,如果我只是通过严格匹配值来做到这一点,比如说,&#34; cat&#34;和&#34; dog&#39;,我可以使用以下方法:

matches <- c("cat","dog")

data$keep <- ifelse(data$animal %in% matches, "Keep", "Discard")

但是使用grep或grepl只引用列表中的第一个参数:

data$keep <- ifelse(grepl(matches, data$animal), "Keep","Discard")

返回

Warning message:
In grepl(matches, data$animal) :
  argument 'pattern' has length > 1 and only the first element will be used

注意,我在搜索中看到了这个帖子,但这似乎不起作用: grep using a character vector with multiple patterns

3 个答案:

答案 0 :(得分:21)

您可以在|的正则表达式中使用“或”(grepl)语句。

ifelse(grepl("dog|cat", data$animal), "keep", "discard")
# [1] "keep"    "keep"    "discard" "keep"    "keep"    "keep"    "keep"    "discard"
# [9] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "discard" "keep"   
#[17] "discard" "keep"    "keep"    "discard" "keep"    "keep"    "discard" "keep"   
#[25] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"   
#[33] "keep"    "discard" "keep"    "discard" "keep"    "discard" "keep"    "keep"   
#[41] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"   
#[49] "keep"    "discard"

正则表达式dog|cat告诉正则表达式引擎查找"dog""cat",并返回两者的匹配项。

答案 1 :(得分:13)

尽可能避免ifelse。例如,这很好用

c("Discard", "Keep")[grepl("(dog|cat)", data$animal) + 1]

对于123种子,您将获得

##  [1] "Keep"    "Keep"    "Discard" "Keep"    "Keep"    "Keep"    "Discard" "Keep"   
##  [9] "Discard" "Discard" "Keep"    "Discard" "Keep"    "Discard" "Keep"    "Keep"   
## [17] "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"   
## [25] "Keep"    "Keep"    "Discard" "Discard" "Keep"    "Keep"    "Keep"    "Keep"   
## [33] "Keep"    "Keep"    "Keep"    "Discard" "Keep"    "Keep"    "Keep"    "Keep"   
## [41] "Keep"    "Discard" "Discard" "Keep"    "Keep"    "Keep"    "Keep"    "Discard"
## [49] "Keep"    "Keep"   

答案 2 :(得分:12)

不确定你尝试了什么,但这似乎有效:

data$keep <- ifelse(grepl(paste(matches, collapse = "|"), data$animal), "Keep","Discard")

与您链接的答案类似。

诀窍是使用粘贴:

paste(matches, collapse = "|")
#[1] "cat|dog"

因此,它会使用dog OR cat创建一个正则表达式,并且还可以使用很长的模式列表,而无需键入每个模式。

编辑:

如果您以后执行此操作,则根据&#34; Keep&#34;对data.frame进行子集化。和&#34;丢弃&#34;条目,你可以直接使用:

data[grepl(paste(matches, collapse = "|"), data$animal),]

这样,grepl的结果为TRUE或FALSE用于子集。