用numpy挂在qsub中的multiprocessing.Pool()

时间:2018-07-16 18:06:46

标签: python-3.x numpy

Numpy环境中,Multiprocessingqsub遇到了问题。

具体地说,我有以下Python代码:

#full_comparisons.py
import numpy as np
import multiprocessing

output = np.ndarray(
        shape=(x, y, z, a),
        dtype=[('site', '>i4'), ('html', '>f4'), ('js', '>f4'), ('png', '>f4')])
##NOTE: output size is only .002 GB, so RAM shouldn't be an issue.
print("Before pool")
pool = multiprocessing.Pool()
print("After pool")

我已经按照以下方式运行qsub(即,我已经尝试了其中的每一个),其中./comparisons仅调用了python3 full_comparisons.py

qsub -V comparisons # -V keep environment variables
qsub -l vlong -V comparisons #-l vlong lets it run infinitely
qsub -V -pe smp 32 comparisons #parallelizes with more processors
qsub -l vlong -V -pe smp 32
qsub -V -pe smp 16 comparisons
qsub -V -pe smp 8 comparisons

还有其他人。

在每种情况下,我都打印Before pool,然后挂起。

我认为这与集群有关,是因为运行./comparisons在本地可以很好地进行多处理。唯一的问题来自使用qsub。也许有一个错误会影响我不了解的NumpyMultiprocessing的使用。

所有相关代码:

import subprocess
import os
import csv
import itertools
import multiprocessing
import numpy as np
import jaccard
import file_names

def compare_lambda(x, y, dict_1, dict_2):
    ...

def compare_all():
    pairs = itertools.combinations(range(GLOBAL_VAR1), 2)
    ids_to_sites, sites_to_ids = init_sites()
    output = np.ndarray(
        shape=(GLOBAL_VAR1, GLOBAL_VAR1, GLOBAL_VAR2, GLOBAL_VAR3),
        dtype=[('x', '>i4'), ('y', '>f4'), ('z', '>f4'), ('a', '>f4')])
    print("Before pool")
    pool = multiprocessing.Pool()
    print("After pool")
    compared_vals = pool.starmap(compare_lambda, list(map(lambda x: (x[0], x[1], dict_1, dict_2), pairs)))

    for (a, b, compared) in compared_vals:
        ...
print(multiprocessing.cpu_count()) #works fine
compare_all()

编辑:在@sehafoc的建议下,我启用了用于多处理的日志记录。有趣的是,当我在计算集群上运行多处理程序时,我有以下内容:

Before pool
[DEBUG/MainProcess] created semlock with handle 47690244722688
[DEBUG/MainProcess] created semlock with handle 47690244726784
[DEBUG/MainProcess] created semlock with handle 47690244730880
[DEBUG/MainProcess] created semlock with handle 47690244734976
[DEBUG/MainProcess] added worker
[DEBUG/MainProcess] added worker
[DEBUG/MainProcess] added worker
[INFO/ForkPoolWorker-1] child process calling self.run()
[INFO/ForkPoolWorker-2] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/ForkPoolWorker-4] child process calling self.run()
[INFO/ForkPoolWorker-3] child process calling self.run()

当我在本地运行它时,输出如下:

Before pool
[DEBUG/MainProcess] created semlock with handle 140313792987136
[DEBUG/MainProcess] created semlock with handle 140313792983040
[DEBUG/MainProcess] created semlock with handle 140313792978944
[DEBUG/MainProcess] created semlock with handle 140313792974848
[DEBUG/MainProcess] added worker
[INFO/ForkPoolWorker-1] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/ForkPoolWorker-2] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/ForkPoolWorker-3] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/ForkPoolWorker-4] child process calling self.run()
After pool

1 个答案:

答案 0 :(得分:0)

更新:强制sys.out.flush将其打印。似乎qsub很少刷新。