Question

我试图使用R包rvest抓取网页，但是当我使用命令html_node时，它返回一个空列表。问题是什么？这是我的代码（我使用SelectorGadget获取标签）：

#SETUP
library(tidyverse)
library(rvest)

#Getting the link for each house

main.link<- "https://www.sreality.cz/en/search/for-sale/apartments/praha"

main.page<-read_html(main.link)

links<-  html_nodes(main.page, css=".title .ng-binding")

您可以看到我是R的初学者。在此先感谢您的帮助。

Answer 1

按照Selcuk Akbas的建议与RSelenium合作

#Loading the rvest package
library(rvest)
library(magrittr) # for the '%>%' pipe symbols
library(RSelenium) # to get the loaded html of 

# starting local RSelenium (this is the only way to start RSelenium that is working for me atm)
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "chrome")
remDr$open()

#Specifying the url for desired website to be scrapped
main_link<- "https://www.sreality.cz/en/search/for-sale/apartments/praha"

# go to website
remDr$navigate(main_link)

# get page source and save it as an html object with rvest
main_page <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()

# get the links
links<-  html_nodes(main_page, css=".title .ng-binding")

希望有帮助！刮刮乐！

rvest html_node返回空列表

1 个答案: