Question

我正在尝试scrape this website使用R中的rvest包。我已经与其他几个网站成功完成了这个但是这个似乎没有用，我不知道为什么。

我从chrome的检查工具中复制了xpath，但是当我在rvest脚本中指定它时，它表明它不存在。它是否与生成表而不是静态的事实有关？

感谢帮助！

library(rvest)
library (tidyverse)
library(stringr)
library(readr)

a<-read_html("http://www.diversitydatakids.org/data/profile/217/benton-county#ind=10,12,15,17,13,20,19,21,24,2,22,4,34,35,116,117,123,99,100,127,128,129,199,201")
a<-html_node(a, xpath="//*[@id='indicator10']")
a<-html_table(a)
a

Answer 1

关于您的问题，是的，因为它是动态生成的，所以您无法获得它。在这些情况下，最好使用RSelenium库：

#Loading libraries
library(rvest) # to read the html
library(magrittr) # for the '%>%' pipe symbols
library(RSelenium) # to get the loaded html of the website

# starting local RSelenium (this is the only way to start RSelenium that is working for me atm)
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "chrome")
remDr$open()

#Specifying the url for desired website to be scrapped
url <- "http://www.diversitydatakids.org/data/profile/217/benton-county#ind=10,12,15,17,13,20,19,21,24,2,22,4,34,35,116,117,123,99,100,127,128,129,199,201"

# go to website
remDr$navigate(url)

# get page source and save it as an html object with rvest
html_obj <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()

# get the element you are looking for
a <-html_node(html_obj, xpath="//*[@id='indicator10']")

我猜您正在尝试获取第一个表。在这种情况下，最好使用read_table来获得表：

# get the table with the indicator10 id
indicator10_table <-html_node(html_obj, "#indicator10 table") %>% html_table()

这次我使用的是CSS选择器，而不是XPath。

希望有帮助！刮刮乐！

Rvest没有在网站上看到xpath

1 个答案: