Question

我的会话是这样的：

startServer()
remDir <- remoteDriver()
remDir$open()
Source <- paste0("https://www.example.com")
remDir$navigate(Source)

我正在解析一些链接：

HTML   <- remDir$getPageSource()
tmp    <- xpathSApply(htmlParse(HTML[[1]]),
                          ' //a/@href')

现在我想解析每个tmp链接：

srcPartOne <- paste0(Source, as.list(tmp)[185:199],"/")
HTMLs      <- lapply(srcPartOne, getURL)

但是在这一点上，getURL函数不适合我。因为链接包含动态页面。所以，我需要在lapply函数中使用RSelenium，如下所示：

HTMLs      <- lapply(srcPartOne, remDir$navigate,remDir$pageSource)

我举了一个例子，我知道这不起作用。如何使用RSelenium解析每个链接？编辑：

library(RSelenium)
  library(RCurl)
  library(rdrop2)
  library(pbapply)

  #Start RSelenium
  drop_auth()  #Dropbox Authentication
startServer()
remDir <- remoteDriver()
remDir$open(silent = TRUE)

#Set 'Vitrin' sources to get mobil number:
Source <- paste0("https://www.sah1b1nden.com")
remDir$navigate(Source)
HTML   <- remDir$getPageSource()
tmp    <- xpathSApply(htmlParse(HTML[[1]]),
                          ' //a/@href')



#Get HTML framework from each ' Vitrin' sources:
#

srcPartOne <- paste0(Source, as.list(tmp)[185:232],"/")
pblapply(srcPartOne, function(x) {
  remDir$navigate(x)
  remDir$getPageSource()
}) -> pgs

Parses<- lapply(X = pgs[1:48], htmlParse) 

temp       <- lapply(Parses, xpathSApply, '//*[contains(concat( " ", @class, " " ), concat( " ", "show-part", " " ))]',xmlValue)

与RSelenium

0 个答案: