打开文件太多(Selenium + PhantomJSDriver)

时间:2017-01-02 14:34:53

标签: java scala selenium selenium-webdriver phantomjs

在我的嵌入式Selenium / PhantomJSDriver驱动程序中,似乎没有清理资源。同步运行客户端会导致数百万个打开的文件,并最终抛出“打开的文件太多”类型异常。

这是我在程序运行约1分钟时从Sheet5.Range("Q9").Offset((9 * i) + 6, (3 * j)).Value收集的一些输出

Option Explicit

Sub ChartSer_LineWidth()

Dim i As Long, j As Long
Dim MyCht As ChartObject

' set the chart object of Chart 6 to a variable
Set MyCht = ActiveSheet.ChartObjects("Chart 6")

For i = 0 To 1
    For j = 0 To 11

        With MyCht.Chart.SeriesCollection(12 * i + j + 1)
            .Format.Line.Visible = msoTrue
            .Format.Line.Weight = Sheet5.Range("Q9").Offset((9 * i), (3 * j)).Value
        End With

    Next j
Next i

End Sub

我不明白为什么在lsof上使用$ lsof | awk '{ print $2; }' | uniq -c | sort -rn | head 1221966 12180 34790 29773 31260 12138 20955 8414 17940 10343 16665 32332 9512 27713 7275 19226 5496 7153 5040 14065 $ lsof -p 12180 | awk '{ print $2; }' | uniq -c | sort -rn | head 2859 12180 1 PID $ lsof -p 12180 -Fn | sort -rn | uniq -c | sort -rn | head 1124 npipe 536 nanon_inode 4 nsocket 3 n/opt/jdk/jdk1.8.0_60/jre/lib/jce.jar 3 n/opt/jdk/jdk1.8.0_60/jre/lib/charsets.jar 3 n/dev/urandom 3 n/dev/random 3 n/dev/pts/20 2 n/usr/share/sbt-launcher-packaging/bin/sbt-launch.jar 2 n/usr/share/java/jayatana.jar 标记的结果集较小。但似乎大多数条目都是-plsof

客户端非常简单,大约100行,并在使用结束时调用pipeanon_inode。我尝试了缓存和重用客户端,但它没有缓解打开的文件

driver.close()

如果有错误修复,我尝试了几个版本的Selenium。 build.sbt:

driver.quit()

另外,我尝试了PhantomJS 2.0.1和2.1.1:

case class HeadlessClient(
                           country: String,
                           userAgent: String,
                           inheritSessionId: Option[Int] = None
                         ) {
  protected var numberOfRequests: Int = 0
  protected val proxySessionId: Int = inheritSessionId.getOrElse(new Random().nextInt(Integer.MAX_VALUE))
  protected val address = InetAddress.getByName("proxy.domain.com")
  protected val host = address.getHostAddress
  protected val login: String = HeadlessClient.username + proxySessionId
  protected val windowSize = new org.openqa.selenium.Dimension(375, 667)

  protected val (mobProxy, seleniumProxy) = {

    val proxy = new BrowserMobProxyServer()
    proxy.setTrustAllServers(true)
    proxy.setChainedProxy(new InetSocketAddress(host, HeadlessClient.port))
    proxy.chainedProxyAuthorization(login, HeadlessClient.password, AuthType.BASIC)
    proxy.addLastHttpFilterFactory(new HttpFiltersSourceAdapter() {
      override def filterRequest(originalRequest: HttpRequest): HttpFilters = {
        new HttpFiltersAdapter(originalRequest) {
          override def proxyToServerRequest(httpObject: HttpObject): io.netty.handler.codec.http.HttpResponse = {
            httpObject match {
              case req: HttpRequest => req.headers().remove(HttpHeaders.Names.VIA)
              case _ =>
            }
            null
          }
        }
      }
    })
    proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT, CaptureType.RESPONSE_CONTENT)
    proxy.start(0)
    val seleniumProxy = ClientUtil.createSeleniumProxy(proxy)
    (proxy, seleniumProxy)
  }

  protected val driver: PhantomJSDriver = {
    val capabilities: DesiredCapabilities = DesiredCapabilities.chrome()
    val cliArgsCap = new util.ArrayList[String]
    cliArgsCap.add("--webdriver-loglevel=NONE")
    cliArgsCap.add("--ignore-ssl-errors=yes")
    cliArgsCap.add("--load-images=no")

    capabilities.setCapability(CapabilityType.PROXY, seleniumProxy)
    capabilities.setCapability("phantomjs.page.customHeaders.Referer", "")
    capabilities.setCapability("phantomjs.page.settings.userAgent", userAgent)
    capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, cliArgsCap)

    new PhantomJSDriver(capabilities)
  }

  driver.executePhantomJS(
    """
      |var navigation = [];
      |
      |this.onNavigationRequested = function(url, type, willNavigate, main) {
      |  navigation.push(url)
      |  console.log('Trying to navigate to: ' + url);
      |}
      |
      |this.onResourceRequested = function(request, net) {
      |    console.log("Requesting " + request.url);
      |    if (! (navigation.indexOf(request.url) > -1)) {
      |        console.log("Aborting " + request.url)
      |        net.abort();
      |    }
      |};
    """.stripMargin
  )

  driver.manage().window().setSize(windowSize)

  def follow(url: String)(implicit ec: ExecutionContext): List[HarEntry] = {
    try{
      Await.result(Future{
        mobProxy.newHar(url)
        driver.get(url)
        val entries = mobProxy.getHar.getLog.getEntries.asScala.toList
        shutdown()
        entries
      }, 45.seconds)
    } catch {
      case e: Exception =>
        try {
          shutdown()
        } catch {
          case shutdown: Exception =>
            throw new Exception(s"Error ${shutdown.getMessage} cleaning up after Exception: ${e.getMessage}")
        }

        throw e
    }
  }

  def shutdown() = {
    driver.close()
    driver.quit()
  }
}

这是PhantomJS还是Selenium问题?我的客户是否使用API​​不正确?

2 个答案:

答案 0 :(得分:3)

资源使用情况是由BrowserMob引起的。要关闭代理并清理其资源,必须调用stop()

对于此客户端,这意味着修改shutdown方法

def shutdown() = {
  mobProxy.stop()
  driver.close()
  driver.quit()
}

另一种方法abort提供了代理服务器的立即终止,并且不会等待流量停止。

答案 1 :(得分:0)

在我看来,这似乎是PhantomJS的一个问题。您可以尝试以下替代方案:

  1. 使用phantomjs 2.5.0-beta。它最近已经发布。我不确定此升级是否能解决您的问题,但至少值得一试。根据更改日志,此版本的新功能是:

    • 将QtWebKit升级到QtWebKitNG
    • 将Qt升级至5.7.1
  2. 关闭webdriver后清理phantomjs进程。您可以实现自己的清理程序以强制在driver.close()之后关闭phantomjs(调用killall -9 phantomjs或类似的东西)。