在R phantomJS中,Rselenium在几次迭代后运行

时间:2015-06-17 11:00:46

标签: r phantomjs rselenium

我使用phantomJS从不同的网站收集数据。在数据报废过程中,我在解析站点或站点元素时会遇到很多崩溃。不幸的是,phantomJS和RSelenium都没有在控制台中提供任何信息或包报告。脚本只是挂起而没有任何警告。我看到它正在执行,但实际上没有任何反应。阻止脚本执行的唯一方法是手动重启R.经过多次测试后,我发现phantomJS通常会在执行remDr $ findElements()命令时挂起。我尝试使用firefox和RSelenium重新编写代码 - 它正常工作。所以问题在于phantomJS是如何工作的。

运行phantomJS时是否有人遇到过类似的事情?是否有可能解决这种不端行为?

我正在使用:

  1. Windows 7
  2. Selenium 2.0
  3. R版本3.1.3
  4. phantomjs-2.0.0窗口
  5. 我的代码:

    # starting phantom server driver
    phantomjsdir <- paste(mywd, "/phantomjs-2.0.0-windows/bin/phantomjs.exe", sep="" )
    phantomjsUserAgent <- "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36 OPR/28.0.1750.48"
    eCap <- list(phantomjs.binary.path = phantomjsdir, phantomjs.page.settings.userAgent = phantomjsUserAgent )
    pJS <- phantom(pjs_cmd = phantomjsdir)
    remDr <- remoteDriver(browserName = "phantomjs", extraCapabilities = eCap)
    remDr$open(silent = FALSE)
    
    
    mywords <- c("canon 600d", "sony 58k","nikon","nikon2","nikon 800","nikon 80","nikon 8")
    timeout <- 3
    
    #'
    #' Exceuting script
    #'
    
    for (word in mywords) {
    
      print(paste0("searching for: ",word))
      ss.word <- word
      remDr$navigate("http://google.com")
    
      webElem <- remDr$findElement(using = "class", "gsfi")
      webElem$sendKeysToElement(list(enc2utf8(ss.word),key = "enter"))
      Sys.sleep(1)
    
      print (remDr$executeScript("return document.readyState;")[[1]])
      while (remDr$executeScript("return document.readyState;")[[1]]!= "complete" && totalwait<10) {
        Sys.sleep(timeout)
      }
    
      print(paste0("search completed: ",ss.word))
      elem.snippet <- remDr$findElements(using="class name",value = "rc")
    
      for (i in 1:length(elem.snippet)) {
    
    
        print(paste0("element opened: ",ss.word,"  pos",i))
        print(elem.snippet[[i]])
        ss.snippet.code  <- elem.snippet[[i]]$getElementAttribute('innerHTML')
        print(paste0("element element innerHTML ok"))
        elemtitle <- elem.snippet[[i]]$findChildElement(using = "class name", value = "r")
        print(paste0("element title ok"))
    
    
        elemcode <- elemtitle$getElementAttribute('innerHTML')
        print(paste0("element innerHTML ok"))
    
    
        elemtext <- elem.snippet[[i]]$findChildElement(using = "class name", value = "st")
        ss.text <- elemtext$getElementText()[[1]]
        print(paste0("element loaded: ",ss.word,"  pos",i))
    
    
        elemloc <- elem.snippet[[i]]$getElementLocation()
        elemsize <- elem.snippet[[i]]$getElementSize()
        print(paste0("element location parsed: ",ss.word,"  pos",i))
    
      }
    
      print(paste0("data collected: ",ss.word))
    }
    

0 个答案:

没有答案