Question

R中是否有一种方法可以在包含单词的列中查找值？例如，我要查找包含单词“ the”的所有值，其中列的某些值是“ the_cat”，“ the_dog”和“ dog”

x <- c("the_dog", "the_cat", "dog")

使用上面的示例，答案将是2。我知道这在Python中相对容易实现，但是我想知道是否有一种方法可以在R中实现。谢谢！

Answer 1

尝试：

sum(grepl("(?<![A-Za-z])the(?![A-Za-z])", x, perl = T))

这在您的示例中为2。

但是让我们考虑一个稍微复杂一点的例子：

x <- c("the_dog", "the_cat", "dog", "theano", "menthe", " the")

输出：

[1] 3

以上，我们尝试匹配之前或之后没有其他字母的任何the（例如theano）。

您还可以在[]内添加其他您不希望匹配的内容，例如如果您不考虑将the99用作词the，则可以使用[A-Za-z0-9]等。

例如，您也可以在stringr中使用以上内容（我已经排除了数字，因此the99以下不会被视为一个单词）：

library(stringr)

sum(str_detect(x, "(?<![A-Za-z0-9])the(?![A-Za-z0-9])"))

Answer 2

library(stringr)
##with a vector
sum(str_detect(c("the_dog", "the_cat", "dog"),"the"))

##In a dataframe

tibble(x = c("the_dog", "the_cat", "dog")) %>%
    filter(str_detect(x, "the")) %>%
    nrow()

Answer 3

x <- c("the_dog", "the_cat", "dog") 
stringr::str_detect(x, "the")
#> [1]  TRUE  TRUE FALSE

^{由reprex package（v0.2.1）于2019-02-23创建}

Answer 4

也尝试：

x <- c("the_dog", "the_cat", "dog")
sum(stringi::stri_count(x,regex="^the"))#matches the at the beginning

结果：

[1] 2

或者：

   x <- c("the_dog", "the_cat", "dog")
  sum(stringi::stri_count(x,regex="the{1,}"))#matches any the

在R中查找包含某个字符串的值

4 个答案: