无法从此站点抓取数据(使用 R)

时间:2021-03-17 19:47:39

标签: r web-scraping rselenium

我似乎无法确定与 RSelenium 一起使用的正确 css 选择器来返回任何数据。 该网站是:https://www.rbcroyalbank.com/investments/gic-rates.html

所需的数据是不可赎回的 GIC 利率,每年支付的利息(第二列):1、2、3、4、5、7、10

一些失败的努力

library("RSelenium")
library("rvest")
library("httr")
library("tidyverse")

remDr$navigate("https://www.rbcroyalbank.com/investments/gic-rates.html")
webElem <- remDr$findElement(using = "css selector", value = "tr:nth-child(7) .text-center:nth-child(2) div")


# OR

pg <- remDr$getPageSource()[[1]]
df <- tibble(Rates = pg %>% 
               read_html() %>% 
               html_nodes(xpath = '//tr[(((count(preceding-sibling::*) + 1) = 6) and parent::*)]//*[contains(concat( " ", @class, " " ), concat( " ", "text-center", " " )) and (((count(preceding-sibling::*) + 1) = 2) and parent::*)]//div') %>% 
               html_text())

1 个答案:

答案 0 :(得分:1)

下面是一个可能的解决方案。

#Library to scrape the infomration Version 1.7.7 (mandatory)
library(RSelenium) 
driver <- rsDriver(browser=c("firefox"), port = 4567L)

#Defines the client part.
remote_driver <- driver[["client"]]
remote_driver$navigate("https://www.rbcroyalbank.com/investments/gic-rates.html")
webElem <- remote_driver$findElement(using = "css selector", value = "#gic-nrg")$clickElement()
x<-remote_driver$findElement(using = "css selector", value = "#guaranteed-return-1 > div:nth-child(1) > table:nth-child(1)")
df<-read.table(text=gsub(' ', '\n', x$getElementText()), header=TRUE)
df[c(-1:-46),]