单个进程代码比多处理 - MCVE执行速度更快

时间:2014-08-10 23:00:25

标签: python performance map multiprocessing

我尝试使用多处理加速我的某个应用程序导致性能降低。我确信这是一个设计缺陷,但这是讨论的重点 - 如何更好地解决这个问题,以便利用多处理。

我目前在1.4ghz原子上的结果:

  1. SP Version = 19秒
  2. MP版本= 24秒
  3. 可以复制和粘贴这两个版本的代码供您查看。数据集位于底部,也可以粘贴。 (我决定不使用xrange来说明问题)

    首先是SP版本:

    *PASTE DATA HERE*    
    
    def calc():
        for i, valD1 in enumerate(D1):
            for i, valD2 in enumerate(D2):
                for i, valD3 in enumerate(D3):  
                    for i, valD4 in enumerate(D4):
                        for i, valD5 in enumerate(D5):
                            for i, valD6 in enumerate(D6):
                                for i, valD7 in enumerate(D7):
                                    sol1=float(valD1[1]+valD2[1]+valD3[1]+valD4[1]+valD5[1]+valD6[1]+valD7[1])
                                    sol2=float(valD1[2]+valD2[2]+valD3[2]+valD4[2]+valD5[2]+valD6[2]+valD7[2])
        return None
    
    print(calc())
    

    现在是MP版本:

    import multiprocessing
    import itertools
    
    *PASTE DATA HERE*
    
    def calculate(vals):
        sol1=float(valD1[0]+valD2[0]+valD3[0]+valD4[0]+valD5[0]+valD6[0]+valD7[0])
        sol2=float(valD1[1]+valD2[1]+valD3[1]+valD4[1]+valD5[1]+valD6[1]+valD7[1])
        return none
    
    def process():
        pool = multiprocessing.Pool(processes=4)
        prod = itertools.product(([x[1],x[2]] for x in D1), ([x[1],x[2]] for x in D2), ([x[1],x[2]] for x in D3), ([x[1],x[2]] for x in D4), ([x[1],x[2]] for x in D5), ([x[1],x[2]] for x in D6), ([x[1],x[2]] for x in D7))
        result = pool.imap(calculate, prod, chunksize=2500)
        pool.close()
        pool.join()
        return result
    
    if __name__ == "__main__":    
        print(process())
    

    两者的数据:

    D1 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
    D2 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
    D3 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
    D4 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
    D5 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
    D6 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
    D7 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
    

    现在理论:

    由于几乎没有实际工作(只是总计7个整数),因此CPU绑定数据太多而进程间通信会产生太多开销,使多处理有效。这似乎是我真正需要多线程能力的情况。因此,在此之前,由于GIL,我在尝试使用其他语言之前正在寻找建议。

    ********调试

    File "calc.py", line 309, in <module>
        smart_calc()
      File "calc.py", line 290, in smart_calc
        results = pool.map(func, chunk_list)
      File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 250, in map
        return self.map_async(func, iterable, chunksize).get()
      File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 554, in get
        raise self._value
    TypeError: sequence index must be integer, not 'slice'
    

    在这种情况下,totallen = 108并且CHUNKS设置为2.当CHUNKS减少到1时,它可以工作。

1 个答案:

答案 0 :(得分:1)

好吧,我想我已经发现实际上可以从多处理中获得速度提升。由于您的实际源列表不是很长,因此将它们全部传递给工作进程是合理的。因此,如果每个工作进程都有相同源列表的副本,那么理想情况下我们希望所有这些进程并行迭代不同的列表,并只是总结该唯一的切片。因为我们知道输入列表的大小,所以我们可以准确地确定itertools.product(D1, D2, ...)的长度,这意味着我们还可以准确地确定每个块应该有多大来均匀分配工作。因此,我们可以为每个worker提供一个特定范围的itertools.product迭代器,它们应该迭代并求和:

import math
import itertools
import multiprocessing
import functools

def smart_calc(valD1, valD2, valD3, valD4, valD5, valD6, valD7, slices):
    # Build an iterator over the entire data set
    prod = itertools.product(([x[1],x[2]] for x in valD1), 
                             ([x[1],x[2]] for x in valD2), 
                             ([x[1],x[2]] for x in valD3), 
                             ([x[1],x[2]] for x in valD4), 
                             ([x[1],x[2]] for x in valD5), 
                             ([x[1],x[2]] for x in valD6), 
                             ([x[1],x[2]] for x in valD7))

    # But only iterate over our unique slice
    for subD1, subD2, subD3, subD4, subD5, subD6, subD7 in itertools.islice(prod, slices[0], slices[1]):
        sol1=float(subD1[0]+subD2[0]+subD3[0]+subD4[0]+subD5[0]+subD6[0]+subD7[0])
        sol2=float(subD1[1]+subD2[1]+subD3[1]+subD4[1]+subD5[1]+subD6[1]+subD7[1])
    return None

def smart_process():
    CHUNKS = multiprocessing.cpu_count()  # Number of pieces to break the list into.
    total_len = len(D1) ** 7  # The total length of itertools.product()
    # Figure out how big each chunk should be. Got this from 
    # multiprocessing.map()
    chunksize, extra = divmod(total_len, CHUNKS)
    if extra:
        chunksize += 1

    # Build a list that has the low index and high index for each
    # slice of the list. Each process will iterate over a unique
    # slice
    low = 0 
    high = chunksize
    chunk_list = []
    for _ in range(CHUNKS):
        chunk_list.append((low, high))
        low += chunksize
        high += chunksize

    pool = multiprocessing.Pool(processes=CHUNKS)
    # Use partial so we can pass all the lists to each worker
    # while using map (which only allows one arg to be passed)
    func = functools.partial(smart_calc, D1, D2, D3, D4, D5, D6, D7) 
    result = pool.map(func, chunk_list)
    pool.close()
    pool.join()
    return result

结果:

sequential: 13.9547419548
mp: 4.0270690918

成功!现在,您必须在拥有它们之后实际组合结果,这将为您的真实程序增加额外的开销。它可能最终使这种方法再次比顺序更慢,但它实际上取决于你实际想要对数据做什么。