Question

我有一组模式，我想从矢量中找到精确匹配，其中每个元素不仅包含模式，还包含标点符号。例如，我希望从以下向量中识别包含apple和orange的条目：

[1] apple
[2] orange
[3] banana,apple
[4] oranges,apples
[5] banana,badapple,pear
[6] tastyorange,apple,pear
[7] tastyorange,badapple,redapple
[8] tastyorange,badapple, apple
[9] tastyorange,badapple. apple

由于我希望完全匹配由标点符号分隔的部分（如果有），这意味着我希望能够将元素1,2,3,6,8,9识别为输出（即，我不希望程序选择像“苹果”，“橙子”，“badapple”或“tastyorange”这样的表达式，但我确实希望它能识别分隔单词的标点符号）。我想知道为此编写正则表达式的最有效方法，因为我将同时寻找许多模式（即apple，orange，pear，{{1} }，等等...）。

谢谢！

Answer 1

在正则表达式中使用wordboundary以进行精确的字符串匹配。

> grep("\\b(apple|orange)\\b", c("apple", "apple,orange", "badapple", "badorange"), perl=TRUE, value=FALSE)
[1] 1 2

<强>解释

\b在单词和非单词字符之间匹配。
(apple|orange)匹配字符串apple或orange。
\b必须跟着一个单词边界。

通过标点符号分隔目标时查找字符串[R]

1 个答案: