Question

使用字符串的向量（数据框的一列），我试图识别字符串摘录的字符串。

在以下示例中，excerpt_of_string是vector_of_strings中第二个元素的摘录（特别是前119个字符）：

excerpt_of_string <- "Considering utilizing eLearning days for snow make-up? Join us on 12/8 for Snow day, sNOw problem! Details https://t.co"

vector_of_strings <- c("Meow", 
                       "Considering utilizing eLearning days for snow make-up? Join us on 12/8 for Snow day, sNOw problem! Details https://t.co/LfbPne3uuo #INeLearn", 
                       "Bark")

我首先尝试使用grepl，预计vector_of_strings的第二个元素将是TRUE，但所有元素都是假的：

grepl(excerpt_of_string, vector_of_strings)
[1] FALSE FALSE FALSE

我还尝试了str_detect包中的stringr：

stringr::str_detect(vector_of_strings, excerpt_of_string)
[1] FALSE FALSE FALSE

为什么这些方法没有在excerpt_of_string的第二个元素中检测到摘录vector_of_strings？

Answer 1

由于字符串中存在元字符，因此无法检测到。

您可以使用fixed=TRUE参数将整个字符串模式视为文字。

grepl(excerpt_of_string, vector_of_strings, fixed=TRUE)
# [1] FALSE  TRUE FALSE

或\Q ... \E，也可用于忽略模式中的元字符。

grepl(paste0('\\Q', excerpt_of_string, '\\E'), vector_of_strings)
# [1] FALSE  TRUE FALSE

检测R

1 个答案: