计算字符串

时间:2017-03-29 19:17:16

标签: r

我试图从网页中计算字符串中的关键字:

#get the URL
u <- "http://www.dlink.com/it/it" 
doc <- getURL(u)

#get the text from the body
html <- htmlTreeParse(doc, useInternal = TRUE)
txt <- xpathApply(html, "//body//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)]", xmlValue)
txt<-toString(txt)
txt

#clean
str_replace_all(txt, "[\r\n\t,]" , "")

search <- c("Wi-Fi","Router","Switch","ADSL")
search
stri_detect_fixed(txt, search)

sum(stri_detect_fixed(text, search))

不幸的是,只有在这个词是否存在的情况下才会计算,相反,我想计算存在多少个关键词(例如,如果Wi-Fi存在两次将是+2),使用stringi库的任何想法?

1 个答案:

答案 0 :(得分:1)

使用stri_count_fixed

library(stringi)

stri_count_fixed(txt, search)
[1] 3 2 5 1

sum(stri_count_fixed(txt, search))
[1] 11