主要编辑

Question

我使用的代码发布在下面。我在Ubuntu 16.04上运行，而我的笔记本电脑配备了i7四核处理器。＆＃34;数据＆＃34;是一个有~100,000行和4列的矩阵。＆＃34; EEMD＆＃34;是一个计算上昂贵的功能。在我的机器上，处理所有列需要5分钟，无论我是并行执行每个列还是使用Pool.map（），如下所示。

我已经在这个网站上看到了其他一些代码块，我已经能够运行并成功演示了Pool.map（），它缩短了运行代码所需的时间，但却减少了进程数量的因素，但是这对我不起作用，我无法弄清楚原因。

无论是使用Pool.map（）还是Pool.imap（），结果都是一样的。

#!/usr/bin/python

import time

from pyeemd import eemd
import numpy as np
import linecache

data = np.loadtxt("test_data.txt")
idx = range(4)

def eemd_sans_multi():
    t = time.time()

    for i in idx:
        eemd(data[:,i])

    print("Without multiprocessing...")
    print time.time()-t

def eemd_wrapper(idx):
    imfs = eemd(data[:,idx])
    return imfs

def eemd_with_multi():
    import multiprocessing as mp

    pool = mp.Pool(processes=4)

    t = time.time()

    for x in pool.map(eemd_wrapper, idx):
        print(x)

    print("With multiprocessing...")
    print time.time()-t


if __name__ == "__main__":
    eemd_sans_multi()
    eemd_with_multi()

基于沙丘的新规范＆＃39;回复

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import ctypes
from time import time

from pyeemd import eemd
import numpy as np
import re
import linecache

data = np.loadtxt("test_data.txt",skiprows=8)
headers = re.split(r'\t+',linecache.getline("test_data.txt", 8))

idx = [i for i, x in enumerate(headers) if x.endswith("Z")]
idx = idx[0:2]
print(idx)

def eemd_wrapper(idx):
    imfs = eemd(data[:,idx])
    return imfs

def main():
    print("serial")
    start = time()
    for i in idx:
        eemd_wrapper(i)
    end = time()
    print("took {} seconds\n".format(end-start))

    for executor_class in (ThreadPoolExecutor, ProcessPoolExecutor):
        print(executor_class.__name__)
        start = time()
        # we'll only be using two workers so as to make time comparisons simple
        with executor_class(max_workers=2) as executor:
            executor.map(eemd_wrapper, idx)
        end = time()
        print("took {} seconds\n".format(end-start))

if __name__ == '__main__':
    main()

Answer 1

在python 3中，您可以尝试ProcessPoolExecutor concurrent.futures模块，这是一个示例：

from time import time
from concurrent.futures import ProcessPoolExecutor


def gcd(pair):
    a, b = pair
    low = min(a, b)
    for i in range(low, 0, -1):
        if a % i == 0 and b % i == 0:
            return i


numbers = [(1963309, 2265973), (2030677, 3814172),
           (1551645, 2229620), (2039045, 2020802), (6532541, 9865412)]
start = time()
results = list(map(gcd, numbers))
end = time()
print('1st Took %.3f seconds' % (end - start))
start = time()
pool = ProcessPoolExecutor(max_workers=2)
results = list(pool.map(gcd, numbers))
end = time()
print('2nd Took %.3f seconds' % (end - start))

Answer 2

主要编辑

看起来libeemd已经是多线程的。 Python中的并行执行不会带来显着的性能提升。你已经说过你正在使用Ubuntu 16.04，这意味着你将使用gcc 5.4（支持OpenMP）编译libeemd。 Makefile of libeemd显示它是使用-fopenmp编译的。所以，是的，它已经是多线程的。

库已经是多线程的，这也解释了为什么ProcessPoolExecutor在示例代码中遇到问题。也就是说，在调用进程池之前已经使用了库，Unix系统创建新进程（分叉）的默认方式是创建进程的伪副本。因此，子工作者将留下一个引用父进程中的线程的库。如果你自己只做ProcessPoolExecutor，你会发现它工作正常。

原始答案

鉴于pyeemd是使用libeemd作为粘合剂的ctypes的包装器，您不需要使用多处理 - 多线程解决方案应该足以获得提速（以及最快的速度提升）。

为什么是线程？

当任务受CPU限制时，通常使用多处理来代替Python中的多线程。这是因为Global Interpreter Lock（GIL），这对于单线程Python的性能至关重要。但是，GIL使多线程纯Python 代码运行，就像它是单线程一样。

但是，当一个线程通过ctypes模块进入C函数时，它会释放GIL，因为该函数不需要执行Python代码。 Python类型被转换为C类型用于调用，numpy数组是C缓冲区的包装器（保证在函数期间存在）。因此，不需要Python解释器及其GIL。

如果使用纯Python，多处理是获得速度提升的好方法，但其中一个缺陷是需要将数据发送给子工作者并将结果返回给父级。如果其中任何一个占用大量内存，那么这会增加向前和向后推送数据的大量开销。那么，如果你不需要，为什么要使用多处理。

实施例

这里我们要测试完成一个长时间运行的C函数需要多长时间。这将在串行中完成一次，一次使用两个工作线程，一次使用两个工作进程。这将表明，当大量工作在C库中完成时，多线程与多处理一样好（如果不是更好）。 lengthy.c只是一个例子，任何使用相同参数调用的确定性但昂贵的函数都可以。

lengthy.c

#include <stdint.h>

double lengthy(uint64_t n) {
    double total = 0;
    for (uint64_t i = 0; i < n; ++i) {
        total += i;
    }
    return total;
}

将代码转换为可由ctypes

加载的库

dunes@dunes-VM:~/src$ gcc -c -Wall -Werror -fpic lengthy.c
dunes@dunes-VM:~/src$ gcc -shared -Wl,-soname,liblengthy.so -o liblengthy.so lengthy.o -lc

time_lengthy.py

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import ctypes
from time import time

# create a handle to the C function lengthy
liblengthy = ctypes.cdll.LoadLibrary('./liblengthy.so')
lengthy = liblengthy.lengthy
lengthy.argtypes = ctypes.c_uint64,
lengthy.restype = ctypes.c_double

def job(arg):
    """This function is only necessary as lengthy itself cannot be pickled, and
    therefore cannot be directly used with a ProcessPoolExecutor.
    """
    return lengthy(arg)

def main():
    n = 1 << 28
    # i << 28 was chosen because it takes approximately 1 second on my machine
    # Feel free to choose any value where 0 <= n < (1 << 64)
    items = [n] * 4  # 4 jobs to do
    print("serial")
    start = time()
    for i in items:
        job(i)
    end = time()
    print("took {} seconds\n".format(end-start))

    for executor_class in (ThreadPoolExecutor, ProcessPoolExecutor):
        print(executor_class.__name__)
        start = time()
        # we'll only be using two workers so as to make time comparisons simple
        with executor_class(max_workers=2) as executor:
            executor.map(job, items)
        end = time()
        print("took {} seconds\n".format(end-start))

if __name__ == '__main__':
    main()

哪，运行时给出：

dunes@dunes-VM:~/src$ python3 multi.py 
serial
took 4.936346530914307 seconds

ThreadPoolExecutor
took 2.59773850440979 seconds

ProcessPoolExecutor
took 2.611887216567993 seconds

我们可以看到并行运行的两个线程/进程几乎是串行运行的单个线程的两倍。但是，线程不会承受在父工作者和子工作者之间来回发送数据的开销。所以，你可以使用线程，因为pyeemd source表明它在纯Python中没有做任何重要的工作。

如何在Python中与Pool.map（）并行运行数据流程？

2 个答案:

主要编辑

原始答案

为什么是线程？

实施例

lengthy.c

time_lengthy.py