并行执行多个功能

时间:2014-02-17 12:06:19

标签: python-3.x parallel-processing multiprocessing

我有大量的函数,每个函数都应该从列表中加载url并执行一些操作。如果可能的话,我需要并行执行所有函数,并且每个函数都应该从列表中并行加载URL。我在这里有一些代码,但是我不确定它是否符合我提到的要求或者可以更简单地完成它?谢谢

class Multi(object):


    #----------------------------------------------------------------------
    def __init__(self, urls_func1, urls_func2):
        """ Initialize class with list of urls """
        self.urls_func1 = urls_func1
        self.urls_func2 = urls_func2 
    #----------------------------------------------------------------------
    def func1(self, url, que):
        "do something"
        que.put(result_func1)

    #----------------------------------------------------------------------
    def func2(self, url, que):
        "do something"
        que.put(result_func2)

    #----------------------------------------------------------------------      
    def run(self):

        "For func1"
        jobs = []
        queue1=[]        
        for url in urls_func1:
            queue1.append(Queue())
            process = multiprocessing.Process(target=self.func1, args=(url,queue1[len(jobs)],))
            jobs.append(process)
            process.start()
        for job in jobs:
            job.join()

        "For func2"
        jobs = []
        queue2=[]      
        for url in urls_func1:
            queue2.append(Queue())
            process = multiprocessing.Process(target=self.func2, args=(url,queue2[len(jobs)],))
            jobs.append(process)
            process.start()
        for job in jobs:
            job.join()



        return result_func1, result_func2

    #----------------------------------------------------------------------


if __name__ == "__main__":

    urls_for_func1=['www.google.com', 'www.google.com', 'www.google.com', 'www.google.com']
    urls_for_func2=['www.google.com', 'www.google.com', 'www.google.com']
    a,b=(Multi(urls_for_func1, urls_for_func2).run())

编辑:修改变量名称

2 个答案:

答案 0 :(得分:1)

您可以同时下载多个网址,同时使用线程池限制并行作业的总数:

from multiprocessing.dummy import Pool # use threads

def func1(url):
    # download url here
    return url, "result", None

def func2(url):
    # do something with the url unsuccessfully
    return url, None, "describe the error"

pool = Pool(20) # no more than 20 concurrent connections
results_for_func1 = pool.map(func1, urls_func1) # it blocks until done
results_for_func2 = pool.map(func2, urls_func2)

答案 1 :(得分:0)

您不需要multiprocessing,因为您的工作负载是网络限制的。

threading更容易,更好地扩展到数千(你的RAM / 8 MB)。

如果您需要比这更大的比例,请使用greenlet(简单)或扭曲(硬核)