WSL和WinPython之间的Multiprocessing.Pool性能不一致

时间:2017-06-10 15:54:02

标签: python numpy multiprocessing windows-subsystem-for-linux

我在WSL中使用python(在windows上的ubuntu上使用bash),我正在与WinPython进行比较。所有测试都在同一台机器上进行。代码:

data = dictionary of big arrays of size (124750, 4)

def compute_histogram(data, lmin=0, lmax=1, num_bins=200):
    bins = np.linspace(lmin, lmax, num_bins)
    indices = np.digitize(data[:, :2], bins, right=True)
    X, Y = np.meshgrid(bins, bins, indexing='ij')
    Q = np.zeros_like(X)
    C = np.zeros_like(X)
    d = data[:, 2]
    for idx in range(0, len(indices)):
        (i,j) = indices[idx]
        Q[i, j] += d[idx]
        C[i, j] += 1
    idx = np.nonzero(C)
    Q[idx] /= C[idx]
    return (X, Y, Q, C)

计算:

start_time = timeit.default_timer() 
results = {}
for (k,v) in data.items():
    results[k] = compute_histogram(v, -0.05, 1.05, 1000,)
elapsed = timeit.default_timer() - start_time
total_time += elapsed
print('Sequential: {:0.3f}s'.format(elapsed))

pool = Pool(4)
results = {}
start_time = timeit.default_timer() 
for (k,v) in data.items():
    def log(result, key=k):
        results[key] = result
    pool.apply_async(compute_histogram, (v, -0.05, 1.05, 1000,), callback=log)
pool.close()
pool.join()
elapsed = timeit.default_timer() - start_time
total_time += elapsed
print('Parallel: {:0.3f}s'.format(elapsed))

WinPython 3.6.1:

Sequential: 6.810s
Parallel: 4.058s

WSL Python 2.7.12:

Sequential: 6.958s
Parallel: 6.490s

WSL Python 3.5.2:

Sequential: 6.823s
Parallel: 35.733s

所有版本的顺序时间相同,这表明numpy正常工作(链接到openblas等)。但是,在WSL上的Python 3中,并行代码要慢得多。可能导致这种情况的任何想法?

修改 我从WSL中删除了ubuntu并安装了arch linux。问题仍然存在:

WSL Arch Linux Python 3.6.1:

Sequential: 6.326s
Parallel: 35.847s

所以至少问题不是特定于分发的,而是特定于python 3(在WSL中运行时)

1 个答案:

答案 0 :(得分:0)

当 WSL 1 将 Linux 命令解释到 Windows 并且已知存在性能问题时,该问题可能仅限于 WSL 1。 WSL 2 的性能看起来好多了,可能是因为它包含了一个真正的 Linux 内核。见Microsoft's comparison of the two versions

# WSL 1 (Ubuntu 18.04), Python 3.6.9, numpy 1.19.4
Sequential: 7.877s
Parallel: 40.677s
# WSL 2 (Ubuntu 20.04), Python 3.8.5, numpy 1.20.3
Sequential: 6.943s
Parallel: 2.618s

我用于测试的程序如下。数组的大小放大了 10 倍。

import numpy as np
import timeit
from multiprocessing import Pool

data = np.random.rand(1247500, 4)
data = {'a': data, 'b': 1-data, 'c': data**2}

def compute_histogram(data, lmin=0, lmax=1, num_bins=200):
    bins = np.linspace(lmin, lmax, num_bins)
    indices = np.digitize(data[:, :2], bins, right=True)
    X, Y = np.meshgrid(bins, bins, indexing='ij')
    Q = np.zeros_like(X)
    C = np.zeros_like(X)
    d = data[:, 2]
    for idx in range(0, len(indices)):
        (i,j) = indices[idx]
        Q[i, j] += d[idx]
        C[i, j] += 1
    idx = np.nonzero(C)
    Q[idx] /= C[idx]
    return (X, Y, Q, C)

start_time = timeit.default_timer() 
results = {}
for (k,v) in data.items():
    results[k] = compute_histogram(v, -0.05, 1.05, 1000,)
elapsed = timeit.default_timer() - start_time
print('Sequential: {:0.3f}s'.format(elapsed))

pool = Pool(4)
results = {}
start_time = timeit.default_timer() 
for (k,v) in data.items():
    def log(result, key=k):
        results[key] = result
    pool.apply_async(compute_histogram, (v, -0.05, 1.05, 1000,), callback=log)
pool.close()
pool.join()
elapsed = timeit.default_timer() - start_time
print('Parallel: {:0.3f}s'.format(elapsed))