xml内容似乎不是xml

时间:2016-09-16 00:29:42

标签: r web-scraping html-parsing

尝试从r-users.com检索一些信息。我使用以下代码并收到警告消息:

XML content does not seem to be XML

任何帮助将不胜感激。

library(data.table)
library(XML)

pages <- c(1:10)

urls <- rbindlist (lapply(pages, function(x) {
  url <- paste("https://www.r-users.com/jobs/page/",x,"/", sep="")
  data.frame(url)
}), fill=TRUE)

jobLocations <- rbindlist (apply(urls, 1, function(url) {
  doc1 <- htmlParse (url)
  locations <- getNodeSet(doc1, '//*[@id="mainContent"]/div[2]/ol/li/dl/dd[3]/span')
  data.frame(sapply(locations, function(x) { xmlValue(x) }))
  }), fill = TRUE)

1 个答案:

答案 0 :(得分:1)

rvest和purrr是网页抓取的强大组合:

Model.SwitchCampaignCapsule