Question

我正在做什么：我正在浏览dbase中的公司表...每个公司都有一个文本description字段，并且在该字段内可以有一些超链接（很少超过4个）。我想要做的是使用curl测试这些链接，以及＃34;坏＆＃34;回应（通常是404，但任何非200的东西都会引起关注）。

顺便说一下，毫无疑问，这和Groovy一样适用于Java，任何说服者都可能有兴趣知道GPars（Groovy parallelism）使用的底层线程池类是ForkJoinPool。 / p>

使用Matcher Pattern使用/(https?:.*?)\)/收集这些网址后，我得到了地图descripURLs＆＃34; url＆＃34; - ＆GT; ＆＃34;公司名称＆＃34;。然后我使用大容量withPool（因为等待响应的固有延迟，显然），如下所示：

startMillis = System.currentTimeMillis() 
AtomicInteger nRequest = new AtomicInteger()
AtomicInteger nResponsesReceived = new AtomicInteger()
poolObject = null
resultP = withPool( 50 ){ pool ->
    poolObject = pool
    descripURLs.eachParallel{ url, name ->
        int localNRequest = nRequest.incrementAndGet()
        Process process = checkURL( url )

        def response
        try {
            //// with the next line TIME PASSES in this Thread...
            response = process.text
        } catch( Exception e ) {
            System.err.println "$e"
        }
        // NB this line doesn't appear to make much difference
        process.destroyForcibly()
        nResponses = nResponsesReceived.incrementAndGet()
        int nRequestsNowMade = nRequest.get()
        if( response.trim() != '200' ) {
            println "\n*** request $localNRequest BAD RESPONSE\nname $name url $url\nresponse |$response|" +
                "\n$nRequestsNowMade made, outstanding ${nRequestsNowMade - nResponses}"
             // NB following line may of course not be printed immmediately after the above line, due to parallelism
            println "\nprocess poolSize $pool.poolSize, queuedTaskCount $pool.queuedTaskCount," +
                " queuedSubmissionCount? $pool.queuedSubmissionCount"   
        }
        println "time now ${System.currentTimeMillis() - startMillis}, activeThreadCount $pool.activeThreadCount"
    }
    println "END OF withPool iterations"
    println "pool $pool class ${pool.class.simpleName}, activeThreadCount $pool.activeThreadCount"
    pool.shutdownNow()
}

println "resultP $resultP class ${resultP.class.simpleName}"
println "pool $poolObject class ${poolObject.class.simpleName}"
println "pool shutdown? $poolObject.shutdown"

def checkURL( url ) {
    def process =  "curl -LI $url -o /dev/null -w '%{http_code}\n' -s".execute()
    // this appears necessary... otherwise potentially you can have processes hanging around forever
    process.waitForOrKill( 8000 ) // 8 s to get a reponse
    process.addShutdownHook{
        println "shutdown on url $url"
    }
    process
}

我在50线程池中观察到的是，500个URL需要20秒才能完成。我已经尝试过更小和更大的游泳池，100似乎没有任何区别，但是25似乎更慢，10更像是40秒完成。对于相同的游泳池大小，计时在运行与运行之间也非常一致。

我不明白的是Process es＆＃39;关闭钩子只在关闭的最后运行...对于所有500 Process es！这并不是说机器上有500个实际进程：使用任务管理器我可以看到任何时候curl.exe进程的数量都相对较小。

同时我从println这里观察到，活动线程计数从50开始，但随后在整个运行期间下降，到最后达到3（通常）。是的...我还可以观察到最终的请求只是在运行结束时才被添加。

这让我想知道线程池是否在某种程度上被堵塞了＃34;通过这个＆＃34;未完成的业务＆＃34;这些＆＃34; zombie＆＃34; Process es ...我希望在运行结束之前完成最终请求（500个）。我有什么方法可以提前关闭这些Process es吗？

Answer 1

Java和Groovy都不支持在子addShutdownHook个实例上使用方法Process。

Java支持的唯一方法addShutdownHook位于Runtime实例上。这增加了一个钩子，可以在JVM关闭时运行。

Groovy为Object类添加了一个方便addShutdownHook()，因此您不必编写Runtime.getRuntime().addShutdownHook(..)，但这不会改变底层机制：这些钩子只在以下位置执行JVM关闭。

因为使用process.addShutdownHook添加的闭包最有可能保留对process实例的引用，所以这些闭包将保持活动直到JVM关闭（Java对象，但不是OS进程）

Java / GPars - 我的线程池似乎被堵塞＆＃34;

1 个答案: