Question

有没有办法只使用R访问维基百科上的文本内容。相当于jSoup的东西，如本帖子所示，在堆栈上Extraction of text using: Jsoup

感谢。

Answer 1

来自here：

# load packages
library(RCurl)
library(XML)

# download html
html <- getURL("https://en.wikipedia.org/wiki/Main_Page", followlocation = TRUE)

# parse html
doc = htmlParse(html, asText=TRUE)
plain.text <- xpathSApply(doc, "//p", xmlValue)
cat(paste(plain.text, collapse = "\n"))

使用R仅从HTML页面中读取相关文本

1 个答案: