RSelenium:在另一个链接中单击一个链接

时间:2019-03-22 17:12:59

标签: r selenium web-scraping dplyr rselenium

我有这个RSelenium脚本:

library(tidyverse)
library(RSelenium) # running through docker
library(rvest)
library(httr)

remDr <- remoteDriver(port = 4445L, browserName = "chrome")
remDr$open()


remDr$navigate("https://books.google.com/")
books <- remDr$findElement(using = "css", "[name = 'q']")

books$sendKeysToElement(list("NHL teams", key = "enter"))

bookElem <- remDr$findElements(using = "xpath",
                               "//h3[@class = 'LC20lb']//parent::a")

links <- sapply(bookElem, function(bookElem){
  bookElem$getElementAttribute("href")
})

以上内容点击了Google搜索返回上的每个链接(每页10个)。单击后,我搜索的书大部分都带有预览。如果有预览,则有一个小的About this book链接可以单击,将您带到发布信息。

我想单击第一个链接,然后如果有预览,请单击“关于这本书”。我有以下内容,但我只收到Error: object of type 'closure' is not subsettable错误:

for(link in links) {

  # Navigate to each link
  remDr$navigate(link)

  # If statement to get past book previews
  if (str_detect(link, "frontcover")) {

   link2 <- remDr$findElement(using = 'xpath', 
                               '//*[@id="sidebar-atblink"]//parent::a')
   link2 <- as.list(link2)
   print(class(link2))
   link2_about <- sapply(link2, function(ugh){
      ugh$getElementAttribute('href')
    })

  } else {
    print("nice going, dumbass")
  }
}

或者我尝试使用for而不是sapply循环,得到Error: $ operator is invalid for atomic vectors

for(link in links) {

  # Navigate to each link
  remDr$navigate(link)

  # If statement to get past book previews
  if (str_detect(link, "frontcover")) {

    link2 <- remDr$findElement(using = 'xpath',
       '//a[@id="sidebar-atb-link" and span[.="About this book"]]')

     for(i in length(link2)){
      i$getElementAttribute('href')
     }

    } else {
     print("dumbass")
   }
}

如何根据预览是否存在而成功单击该第二个链接?谢谢!

1 个答案:

答案 0 :(得分:1)

只需更新以下行即可。

aboutLinks <- remDr$findElements(using = 'xpath', 
                           '//a[@id="sidebar-atb-link" and span[.="About this book"]]')
links2 <- sapply(aboutLinks, function(about_link){
  about_link$getElementAttribute('href')
})