尝试从r-users.com检索一些信息。我使用以下代码并收到警告消息:
XML content does not seem to be XML
任何帮助将不胜感激。
library(data.table)
library(XML)
pages <- c(1:10)
urls <- rbindlist (lapply(pages, function(x) {
url <- paste("https://www.r-users.com/jobs/page/",x,"/", sep="")
data.frame(url)
}), fill=TRUE)
jobLocations <- rbindlist (apply(urls, 1, function(url) {
doc1 <- htmlParse (url)
locations <- getNodeSet(doc1, '//*[@id="mainContent"]/div[2]/ol/li/dl/dd[3]/span')
data.frame(sapply(locations, function(x) { xmlValue(x) }))
}), fill = TRUE)
答案 0 :(得分:1)
rvest和purrr是网页抓取的强大组合:
Model.SwitchCampaignCapsule