Question

我想创建一个单一的正则表达式（如果可能的话）来搜索字符串并确定两个单词是否出现在同一个字符串中。我知道我可以使用两个grepl语句（如下所示），但我想使用单个正则表达式来测试这种情况。正则表达式越有效越好。

我想找到包含“man”和“dog”不区分大小写的字符串。

x <- c(
    "The dog and the man play in the park.",
    "The man plays with the dog.",
    "That is the man's hat.",
    "Man I love that dog!",
    "I'm dog tired"
)

## this works but I want a single regex
grepl("dog", x, ignore.case=TRUE)  & grepl("man", x, ignore.case=TRUE)

Answer 1

使用正则表达式替换运算符|。

grepl(".*(dog.*man|man.*dog).*", x, ignore.case=TRUE)

必要时使用字边界..

grepl(".*(\\bdog\\b.*\\bman\\b|\\bman\\b.*\\bdog\\b).*", x, ignore.case=TRUE)

无需前导和尾随.*

grepl("(dog.*man|man.*dog)", x, ignore.case=TRUE)

您可以在正则表达式本身中提供不区分大小写的修饰符。

grepl("(?i)(dog.*man|man.*dog)", x)

Answer 2

您可以使用类似Perl的正则表达式，具有2个预测：

grepl("^(?=.*\\bman\\b)(?=.*\\bdog\\b)", x, ignore.case=TRUE, perl=TRUE)

请参阅IDEONE demo

上面输入的结果：[1] TRUE TRUE FALSE TRUE FALSE

^(?=.*\\bman\\b)(?=.*\\bdog\\b)前瞻只检查输入中的整个单词man和dog，只有两者都存在才会通过，无论其顺序如何（dog可能是在man之前，反之亦然。）

由于^字符串起始锚点，每次输入只执行一次这些检查，从而保持良好的性能。

R正则表达式找到两个单词相同的字符串，顺序和距离可能会有所不同

2 个答案: