Python多重处理:在准备好后立即拆分计算并分析结果

时间:2019-07-03 14:08:22

标签: python multithreading

我有一个计算要求很高的可行性问题,因此我想将其分解为较小的子问题,以便每个子问题都可以相对快速地解决并且是可并行的。通常,我不想等所有子问题都完成后再完成。我想在子问题之一返回True时立即返回True。

我已经提出了以下解决方案

-启动所有其他过程,然后检查报告队列中子问题结果的主要过程。如果有一个True结果,它将终止所有其他进程并返回True。否则,它将等待所有子问题完成。

-拆分过程,它是计算如何拆分原始问题的过程(也需要花费时间,所以我不想提前计算所有拆分)并添加作业(尽快)计算)到作业队列。

-作业调度程序从作业队列中读取作业,并将其添加到要处理的池中。结果将添加到报告队列中。

import datetime
import multiprocessing as mp
import random
import time
from threading import Thread
from timeit import default_timer as timer
from multiprocessing import Process


def execute_job(job, reporting_queue):
    time.sleep(job.running_time)
    reporting_queue.put((job.id, job.result, job.running_time))


class Job:
    def __init__(self, id, result, running_time):
        self.id = id
        self.result = result
        self.running_time = running_time


class SplittingProcess(Process):

    def __init__(self, jobs_queue, no_more_jobs_queue, total_n_splits_queue):
        super(SplittingProcess, self).__init__()

        # the queue to which jobs will be added
        self.jobs_queue = jobs_queue
        # this queue will become non-empty when all the splits will have been computed
        self.no_more_jobs_queue = no_more_jobs_queue
        # this queue will contain the total number of splits once they all have been computed
        self.total_n_splits_queue = total_n_splits_queue

    def run(self):
        sub_jobs_count = 10
        for i in range(sub_jobs_count):
            sleep_time = random.random()
            time.sleep(sleep_time)

            job_running_time = random.randint(1, 8)
            job_result = random.choice([True, False, "Timeout"])
            self.jobs_queue.put(Job(i, job_result, job_running_time))

        self.no_more_jobs_queue.put(True)
        self.total_n_splits_queue.put(sub_jobs_count)


class JobDispatcher(Process):
    PARALLEL_PROCESSES_NUMBER = 2
    CHECKING_INTERVAL = 0.1

    def __init__(self, jobs_queue, no_more_jobs_queue, reporting_queue):
        super(JobDispatcher, self).__init__()

        # the queue from which jobs will be read
        self.jobs_queue = jobs_queue
        # this queue will become non-empty when the splitter will have finished
        self.no_more_jobs_queue = no_more_jobs_queue
        # the queue to pass to the worker process to which they will report
        self.reporting_queue = reporting_queue

        self.pool = mp.Pool(processes=JobScheduler.PARALLEL_PROCESSES_NUMBER)

    def run(self):
        while self.no_more_jobs_queue.empty():
            time.sleep(self.CHECKING_INTERVAL)

            while not self.jobs_queue.empty():
                job = self.jobs_queue.get()
                handler = self.pool.apply_async(execute_job, args=(job, self.reporting_queue))
                handler.get()


class MainVerificationProcess(Thread):

    CHECKING_INTERVAL = 0.1

    def __init__(self, output_queue):
        super(MainVerificationProcess, self).__init__()

        self.output_queue = output_queue
        self.reporting_queue = mp.Queue()
        self.total_n_splits_queue = mp.Queue()
        self.total_number_of_splits = -1

        jobs_queue = mp.Queue()
        no_more_jobs_queue = mp.Queue()
        self.job_dispatcher = JobDispatcher(jobs_queue, no_more_jobs_queue, self.reporting_queue)
        self.splitting_process = SplittingProcess(jobs_queue, no_more_jobs_queue, self.total_n_splits_queue)

    def run(self):

        self.splitting_process.start()
        self.job_dispatcher.start()

        timeout_results = set()
        finished_subprocesses_ids = set()

        while True:
            time.sleep(self.CHECKING_INTERVAL)

            while not self.reporting_queue.empty():
                p_id, res, runtime = self.reporting_queue.get()
                if res == True:
                    self.job_dispatcher.terminate()
                    self.splitting_process.terminate()
                    self.output_queue.put(True)
                    return
                elif res == False:
                    finished_subprocesses_ids.add(p_id)
                elif res == "Timeout":
                    timeout_results.add((p_id, res, runtime))
                    finished_subprocesses_ids.add(p_id)

            if not self.total_n_splits_queue.empty():
                self.total_number_of_splits = self.total_n_splits_queue.get()

            if self.total_number_of_splits != -1 and len(finished_subprocesses_ids) >= self.total_number_of_splits:
                if len(timeout_results) == 0:
                    self.output_queue.put(False)
                else:
                    self.output_queue.put("Unknown")
                return


if __name__ == '__main__':
    print('\n################ {} #################'.format(datetime.datetime.now()))
    output = mp.Queue()
    verProcess = MainVerificationProcess(output)

    start = timer()
    verProcess.start()
    verProcess.join()
    end = timer()

    result = output.get()
    print("Overal result and time", result, end - start)

但是出现以下错误:RuntimeError: Queue objects should only be shared between processes through inheritance。我知道我只能在构造函数中传递队列,因此我需要使用handler.get()方法来检索子问题的结果。但是,我正在寻找的行为与给定的代码完全相同(假设我可以传递队列)。我不想等所有子问题都完成后再检查结果。我希望能够尽快处理它们。还要注意,子问题的数量可能很大(几百个),并且事先不知道,所以对我来说,一个池很方便。而且,某些子问题可能很快终止(在一秒钟之内),而某些子问题可能需要一百秒钟。

我的问题有一些简单的解决方法吗?还是我需要实现自己的池?

0 个答案:

没有答案