我的会话是这样的:
startServer()
remDir <- remoteDriver()
remDir$open()
Source <- paste0("https://www.example.com")
remDir$navigate(Source)
我正在解析一些链接:
HTML <- remDir$getPageSource()
tmp <- xpathSApply(htmlParse(HTML[[1]]),
' //a/@href')
现在我想解析每个tmp链接:
srcPartOne <- paste0(Source, as.list(tmp)[185:199],"/")
HTMLs <- lapply(srcPartOne, getURL)
但是在这一点上,getURL函数不适合我。因为链接包含动态页面。所以,我需要在lapply
函数中使用RSelenium,如下所示:
HTMLs <- lapply(srcPartOne, remDir$navigate,remDir$pageSource)
我举了一个例子,我知道这不起作用。如何使用RSelenium解析每个链接? 编辑:
library(RSelenium)
library(RCurl)
library(rdrop2)
library(pbapply)
#Start RSelenium
drop_auth() #Dropbox Authentication
startServer()
remDir <- remoteDriver()
remDir$open(silent = TRUE)
#Set 'Vitrin' sources to get mobil number:
Source <- paste0("https://www.sah1b1nden.com")
remDir$navigate(Source)
HTML <- remDir$getPageSource()
tmp <- xpathSApply(htmlParse(HTML[[1]]),
' //a/@href')
#Get HTML framework from each ' Vitrin' sources:
#
srcPartOne <- paste0(Source, as.list(tmp)[185:232],"/")
pblapply(srcPartOne, function(x) {
remDir$navigate(x)
remDir$getPageSource()
}) -> pgs
Parses<- lapply(X = pgs[1:48], htmlParse)
temp <- lapply(Parses, xpathSApply, '//*[contains(concat( " ", @class, " " ), concat( " ", "show-part", " " ))]',xmlValue)