我试图从这个网页www.kinyo.es获取正文 但它返回了这个问题:
Error in which(value == defs) :
argument "code" is missing, with no default
In addition: Warning messages:
1: XML content does not seem to be XML: 'Error displaying the error page: Application Instantiation Error: Could not connect to MySQL.'
2: XML content does not seem to be XML: ''
我的代码是以下循环:
for(i in 1:n)
{
#get the URL
u <- webpage[i]
doc <- getURL(u)
#get the text from the body
html <- htmlTreeParse(doc, useInternal = TRUE)
txt <- xpathApply(html, "//body//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)]", xmlValue)
txt<-toString(txt)
txt
#clean
txt<-(str_replace_all(txt, "[\r\n\t,]" , ""))
txt<-tolower(txt)
txt
search <- c("wi-fi","router","switch","adsl","wireless")
search
stri_count_fixed(txt, search)
conta[i]<-sum(stri_count_fixed(txt, search))
#txt
}
答案 0 :(得分:1)
这有点拉伸,因为我读了你的其他问题,我只能假设这就是你所追求的:
library(rvest)
library(stringr)
count_keywords <- function(url, keywords){
read_html(url) %>%
html_nodes(xpath = '//body//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)]') %>%
html_text() %>%
toString() %>%
str_count(keywords) %>%
sum
}
urls <- c('http://www.dlink.com/it/it', 'http://www.kinyo.es')
search <- c("Wi-Fi","Router","Switch","ADSL")
res <- sapply(urls, count_keywords, search)
res
#> http://www.dlink.com/it/it http://www.kinyo.es
#> 11 0