Question

grepl("instance|percentage", labelTest$Text)

如果instance或percentage中的任何一个存在，

将返回true。

只有当两个条款都存在时，我才能得到真实。

Answer 1

Text <- c("instance", "percentage", "n", 
          "instance percentage", "percentage instance")

grepl("instance|percentage", Text)
# TRUE  TRUE FALSE  TRUE  TRUE

grepl("instance.*percentage|percentage.*instance", Text)
# FALSE FALSE FALSE TRUE  TRUE

后者的工作原因是寻找：

('instance')(any character sequence)('percentage')  
OR  
('percentage')(any character sequence)('instance')

当然，如果你需要找到两个以上单词的任意组合，这将变得非常复杂。然后，评论中提到的解决方案将更容易实现和阅读。

匹配多个单词时可能相关的另一个替代方法是使用正面预测（可以被认为是非消耗性的匹配）。为此，您必须激活perl正则表达式。

# create a vector of word combinations
words <- c("instance", "percentage", "element",
           "character", "n", "o", "p")
Text2 <- combn(words, 5, function(x) paste(x, collapse=" "))

longperl <- grepl("(?=.*instance)
                   (?=.*percentage)
                   (?=.*element)
                   (?=.*character)", Text2, perl=TRUE)

# this is equivalent to the solution proposed in the comments
longstrd <- grepl("instance", Text2) & 
          grepl("percentage", Text2) & 
             grepl("element", Text2) & 
           grepl("character", Text2)

# they produce identical results
all(longperl == longstrd)

Answer 2

使用相交，并为每个单词提供 grep

library（data.table）＃用于在下面设置文本向量

vector_of_text [ 相交（ grep （vector_of_text，pattern =“ pattern1”）， grep （vector_of_text，pattern =“ pattern2”）） ]

Answer 3

如果两个术语确实出现在向量“ labelTest $ Text”的项中，这将是您仅获得“ TRUE”的方式。我认为这是对问题的确切答案，比其他解决方案要短得多。

grepl("instance",labelTest$Text) & grepl("percentage",labelTest$Text)

r- grepl找到存在多个字符串

3 个答案: