Question

似乎grep在返回匹配的方式中是“贪婪的”。假设我有以下数据：

Sources <- c(
                "Coal burning plant",
                "General plant",
                "coalescent plantation",
                "Charcoal burning plant"
        )

Registry <- seq(from = 1100, to = 1103, by = 1)

df <- data.frame(Registry, Sources)

如果我执行grep("(?=.*[Pp]lant)(?=.*[Cc]oal)", df$Sources, perl = TRUE, value = TRUE)，则返回

"Coal burning plant"     
"coalescent plantation"  
"Charcoal burning plant"

但是，我只想返回完全匹配，即仅发生“煤”和“植物”的地方。我不想要“合并”，“种植园”等。所以为此，我只想看"Coal burning plant"

Answer 1

您希望在单词模式周围使用单词边界\b。单词边界不消耗任何字符。它断言，一方面有一个字符，而另一方则没有。您可能还需要考虑使用内联(?i)修饰符进行不区分大小写的匹配。

grep('(?i)(?=.*\\bplant\\b)(?=.*\\bcoal\\b)', df$Sources, perl=T, value=T)

Working Demo

Answer 2

如果您总是希望订单“煤炭”然后“工厂”，那么这应该工作

grep("\\b[Cc]oal\\b.*\\b[Pp]lant\\b", Sources, perl = TRUE, value=T)

这里我们添加\b匹配，代表单词边界。您可以将单词边界添加到我们原来的尝试中

grep("(?=.*\\b[Pp]lant\\b)(?=.*\\b[Cc]oal\\b)", Sources, 
    perl = TRUE, value = TRUE)

R grep和完全匹配

2 个答案: