我知道这是一个非常天真的问题,但我尝试了很多,但没有找到一种方法来计算R中字符串中指定子字符串的出现次数。
例如:
str <- "Hello this is devavrata! here, say again hello"
现在我想查找hello
的出现次数,忽略大小写。在这个例子中,答案应该是2.
编辑:我想知道,当我找到ello th
然后str_count
会发生1
但是我希望确切的单词包含空格出现意味着它应该给zero
。例如,如果我想在特定的字符串中找到very good
,例如: -
It is very good to speak like thevery good
此处出现的1
不是2
。我希望你明白。
答案 0 :(得分:4)
您也可以尝试:
library(stringi)
stri_count(str, regex="(?i)hello")
#[1] 2
str1 <- "It is very good to speak like thevery good"
stri_count(str1, regex="\\b(?i)very good\\b")
#[1] 1
答案 1 :(得分:2)
也许最简单,最直接的方法是使用str_count
中的stringr
str <- "Hello this is devavrata! here, say again hello"
library(stringr)
str_count(str, ignore.case("hello"))
# [1] 2
两种基本R方法
length(grep("hello", strsplit(str, " ")[[1]], ignore.case = TRUE))
# [1] 2
和
sum(gregexpr("hello", str, ignore.case = TRUE)[[1]] > 0)
# [1] 2
答案 2 :(得分:2)
我迟到了,但我认为termco
包中的qdap
函数完全符合您的要求。您可以使用前导和/或尾随空格来控制字边界,如下例所示:
x <- c("Hello this is devavrata! here, say again hello",
"It is very good to speak like thevery good")
library(qdap)
(out <- termco(x, id(x), list("hello", "very good", " very good ")))
## x word.count hello very good very good
## 1 1 8 2(25.00%) 0 0
## 2 2 9 0 2(22.22%) 1(11.11%)
## To get a data frame of pure counts:
out %>% counts()
## x word.count hello very good very good
## 1 1 8 2 0 0
## 2 2 9 0 2 1