在引用值列表时是否可以使用 grepl 参数,可能使用%in%运算符?我想采取下面的数据,如果动物名称有"狗"或者" cat"在其中,我想要返回一定的值,比如," keep&#34 ;;如果它没有" dog"或者" cat",我想返回"丢弃"。
data <- data.frame(animal = sample(c("cat","dog","bird", 'doggy','kittycat'), 50, replace = T))
现在,如果我只是通过严格匹配值来做到这一点,比如说,&#34; cat&#34;和&#34; dog&#39;,我可以使用以下方法:
matches <- c("cat","dog")
data$keep <- ifelse(data$animal %in% matches, "Keep", "Discard")
但是使用grep或grepl只引用列表中的第一个参数:
data$keep <- ifelse(grepl(matches, data$animal), "Keep","Discard")
返回
Warning message:
In grepl(matches, data$animal) :
argument 'pattern' has length > 1 and only the first element will be used
注意,我在搜索中看到了这个帖子,但这似乎不起作用: grep using a character vector with multiple patterns
答案 0 :(得分:21)
您可以在|
的正则表达式中使用“或”(grepl
)语句。
ifelse(grepl("dog|cat", data$animal), "keep", "discard")
# [1] "keep" "keep" "discard" "keep" "keep" "keep" "keep" "discard"
# [9] "keep" "keep" "keep" "keep" "keep" "keep" "discard" "keep"
#[17] "discard" "keep" "keep" "discard" "keep" "keep" "discard" "keep"
#[25] "keep" "keep" "keep" "keep" "keep" "keep" "keep" "keep"
#[33] "keep" "discard" "keep" "discard" "keep" "discard" "keep" "keep"
#[41] "keep" "keep" "keep" "keep" "keep" "keep" "keep" "keep"
#[49] "keep" "discard"
正则表达式dog|cat
告诉正则表达式引擎查找"dog"
或"cat"
,并返回两者的匹配项。
答案 1 :(得分:13)
尽可能避免ifelse
。例如,这很好用
c("Discard", "Keep")[grepl("(dog|cat)", data$animal) + 1]
对于123
种子,您将获得
## [1] "Keep" "Keep" "Discard" "Keep" "Keep" "Keep" "Discard" "Keep"
## [9] "Discard" "Discard" "Keep" "Discard" "Keep" "Discard" "Keep" "Keep"
## [17] "Keep" "Keep" "Keep" "Keep" "Keep" "Keep" "Keep" "Keep"
## [25] "Keep" "Keep" "Discard" "Discard" "Keep" "Keep" "Keep" "Keep"
## [33] "Keep" "Keep" "Keep" "Discard" "Keep" "Keep" "Keep" "Keep"
## [41] "Keep" "Discard" "Discard" "Keep" "Keep" "Keep" "Keep" "Discard"
## [49] "Keep" "Keep"
答案 2 :(得分:12)
不确定你尝试了什么,但这似乎有效:
data$keep <- ifelse(grepl(paste(matches, collapse = "|"), data$animal), "Keep","Discard")
与您链接的答案类似。
诀窍是使用粘贴:
paste(matches, collapse = "|")
#[1] "cat|dog"
因此,它会使用dog OR cat创建一个正则表达式,并且还可以使用很长的模式列表,而无需键入每个模式。
如果您以后执行此操作,则根据&#34; Keep&#34;对data.frame进行子集化。和&#34;丢弃&#34;条目,你可以直接使用:
data[grepl(paste(matches, collapse = "|"), data$animal),]
这样,grepl
的结果为TRUE或FALSE用于子集。