使用grails和gpar处理大量数据

时间:2014-11-14 15:19:08

标签: grails groovy concurrency quartz-scheduler gpars

我有一个Grails应用程序,每天午夜运行一份工作。在我的示例应用程序中,我有10000 Person条记录,并在石英作业中执行以下操作:

package threading

import static grails.async.Promises.task
import static groovyx.gpars.GParsExecutorsPool.withPool

class ComplexJob {
    static triggers = {
        simple repeatInterval: 30 * 1000l
    }

    def execute() {
        if (Person.count == 5000) {
            println "Executing job"                
            withPool 10000, {
                Person.listOrderByAge(order: "asc").each { p ->
                    task {
                        log.info "Started ${p}"
                        Thread.sleep(15000l - (-1 * p.age))
                    }.onComplete {
                        log.info "Completed ${p}"
                    }
                }
            }                
        }
    }
}

忽略repeatInterval,因为这仅用于测试目的。 当作业执行时,我得到以下异常:

2014-11-14 16:11:51,880 quartzScheduler_Worker-3 grails.plugins.quartz.listeners.ExceptionPrinterJobListener - Exception occurred in job: Grails Job
org.quartz.JobExecutionException: java.lang.IllegalStateException: The thread pool executor cannot run the task. The upper limit of the thread pool size has probably been reached. Current pool size: 1000 Maximum pool size: 1000 [See nested exception: java.lang.IllegalStateException: The thread pool executor cannot run the task. The upper limit of the thread pool size has probably been reached. Current pool size: 1000 Maximum pool size: 1000]
    at grails.plugins.quartz.GrailsJobFactory$GrailsJob.execute(GrailsJobFactory.java:111)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.lang.IllegalStateException: The thread pool executor cannot run the task. The upper limit of the thread pool size has probably been reached. Current pool size: 1000 Maximum pool size: 1000
    at org.grails.async.factory.gpars.LoggingPoolFactory$3.rejectedExecution(LoggingPoolFactory.groovy:100)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
    at groovyx.gpars.scheduler.DefaultPool.execute(DefaultPool.java:155)
    at groovyx.gpars.group.PGroup.task(PGroup.java:305)
    at groovyx.gpars.group.PGroup.task(PGroup.java:286)
    at groovyx.gpars.dataflow.Dataflow.task(Dataflow.java:93)
    at org.grails.async.factory.gpars.GparsPromise.<init>(GparsPromise.groovy:41)
    at org.grails.async.factory.gpars.GparsPromiseFactory.createPromise(GparsPromiseFactory.groovy:68)
    at grails.async.Promises.task(Promises.java:123)
    at threading.ComplexJob$_execute_closure1_closure3.doCall(ComplexJob.groovy:20)
    at threading.ComplexJob$_execute_closure1.doCall(ComplexJob.groovy:19)
    at groovyx.gpars.GParsExecutorsPool$_withExistingPool_closure2.doCall(GParsExecutorsPool.groovy:192)
    at groovyx.gpars.GParsExecutorsPool.withExistingPool(GParsExecutorsPool.groovy:191)
    at groovyx.gpars.GParsExecutorsPool.withPool(GParsExecutorsPool.groovy:162)
    at groovyx.gpars.GParsExecutorsPool.withPool(GParsExecutorsPool.groovy:136)
    at threading.ComplexJob.execute(ComplexJob.groovy:18)
    at grails.plugins.quartz.GrailsJobFactory$GrailsJob.execute(GrailsJobFactory.java:104)
    ... 2 more
2014-11-14 16:12:06,756 Actor Thread 20 org.grails.async.factory.gpars.LoggingPoolFactory - Async execution error: A DataflowVariable can only be assigned once. Only re-assignments to an equal value are allowed.
java.lang.IllegalStateException: A DataflowVariable can only be assigned once. Only re-assignments to an equal value are allowed.
    at groovyx.gpars.dataflow.expression.DataflowExpression.bind(DataflowExpression.java:368)
    at groovyx.gpars.group.PGroup$4.run(PGroup.java:315)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2014-11-14 16:12:06,756 Actor Thread 5 org.grails.async.factory.gpars.LoggingPoolFactory - Async execution error: A DataflowVariable can only be assigned once. Only re-assignments to an equal value are allowed.
java.lang.IllegalStateException: A DataflowVariable can only be assigned once. Only re-assignments to an equal value are allowed.
    at groovyx.gpars.dataflow.expression.DataflowExpression.bind(DataflowExpression.java:368)
    at groovyx.gpars.group.PGroup$4.run(PGroup.java:315)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

当我使用withPool(10000)时,似乎线程池尚未设置为10000 我可以在块中执行此计算(现在只打印日志语句)吗?如果是这样,我如何判断处理的最新项目是什么(例如,继续)?

2 个答案:

答案 0 :(得分:1)

我怀疑withPool()方法没有效果,因为该任务很可能使用默认线程池,而不是在withPool中创建的线程池。尝试删除对withPool()的调用,看看任务是否仍在运行。

GPars中的groovyx.gpars.scheduler.DefaultPool池(任务的默认值)会根据任务进行调整,并且限制为1000个并发线程。

我建议改为创建一个固定大小的池,例如:

def group = new DefaultPGroup(numberOfThreads)
group.task {...}

注意:我不熟悉grails.async任务,只有核心GPars,因此grails.async中PGroup周围的情况可能略有不同。

答案 1 :(得分:0)

尝试将每个元素的处理包装到任务似乎不是最佳的。进行并行处理的标准方法是将整个任务拆分为适当数量的子任务。您将从选择此号码开始。对于CPU绑定任务,您可以创建N =处理器数量的任务。然后将任务拆分为N个子任务。像这样:

persons = Person.listOrderByAge(order: "asc")
nThreads = Runtime.getRuntime().availableProcessors()
size = persons.size() / nThreads
withPool nThreads, {
    persons.collate(size).each { subList =>
        task {
            subList.each { p =>
                ...     
            }
        }           
    }
}