与RSelenium

时间:2016-05-29 01:26:42

标签: r rselenium

我的会话是这样的:

startServer()
remDir <- remoteDriver()
remDir$open()
Source <- paste0("https://www.example.com")
remDir$navigate(Source)

我正在解析一些链接:

HTML   <- remDir$getPageSource()
tmp    <- xpathSApply(htmlParse(HTML[[1]]),
                          ' //a/@href')

现在我想解析每个tmp链接:

srcPartOne <- paste0(Source, as.list(tmp)[185:199],"/")
HTMLs      <- lapply(srcPartOne, getURL)

但是在这一点上,getURL函数不适合我。因为链接包含动态页面。所以,我需要在lapply函数中使用RSelenium,如下所示:

HTMLs      <- lapply(srcPartOne, remDir$navigate,remDir$pageSource)

我举了一个例子,我知道这不起作用。如何使用RSelenium解析每个链接? 编辑:

library(RSelenium)
  library(RCurl)
  library(rdrop2)
  library(pbapply)

  #Start RSelenium
  drop_auth()  #Dropbox Authentication
startServer()
remDir <- remoteDriver()
remDir$open(silent = TRUE)

#Set 'Vitrin' sources to get mobil number:
Source <- paste0("https://www.sah1b1nden.com")
remDir$navigate(Source)
HTML   <- remDir$getPageSource()
tmp    <- xpathSApply(htmlParse(HTML[[1]]),
                          ' //a/@href')



#Get HTML framework from each ' Vitrin' sources:
#

srcPartOne <- paste0(Source, as.list(tmp)[185:232],"/")
pblapply(srcPartOne, function(x) {
  remDir$navigate(x)
  remDir$getPageSource()
}) -> pgs

Parses<- lapply(X = pgs[1:48], htmlParse) 

temp       <- lapply(Parses, xpathSApply, '//*[contains(concat( " ", @class, " " ), concat( " ", "show-part", " " ))]',xmlValue)

0 个答案:

没有答案