为什么导入本地模块会导致Concurrent.futures.ProcessPoolExecutor引发BrokenProcessPool异常?
当我导入本地模块时,运行文件时出现BrokenProcessPool异常。我尝试注释掉该模块中的所有内容,但得到的结果相同。我也尝试过其他具有相同结果的文件/模块。但是,如果我注释了import语句,或将其放置在main()函数中,则它可以在不终止进程和引发异常的情况下工作。我在其他本地模块上尝试了相同的操作,但得到的结果相同。为什么会发生这种情况,我该怎么做才能避免出现异常情况?
我正在尝试与ProcessPoolExecutor一起使用current.futures。我将代码示例基于该问题的最高答案:Parallelize apply after pandas groupby
这是我的版本:
import pandas as pd
import numpy as np
import time
from concurrent.futures import ProcessPoolExecutor, as_completed
import analysis_helper # a local module
print(__name__)
nrows = 15000
np.random.seed(1980)
df = pd.DataFrame({'a': np.random.permutation(np.arange(nrows))})
def f1(group):
time.sleep(0.0001)
return group
def main():
with ProcessPoolExecutor(12) as ppe:
futures = []
results = []
for name, group in df.groupby('a'):
p = ppe.submit(f1, group)
futures.append(p)
for future in as_completed(futures):
r = future.result()
results.append(r)
df_output = pd.concat(results)
print(df_output)
if __name__ == '__main__':
main()
删除了analysis_helper的结果:
runfile('C:/dev/.../test_parallelizer_pandas.py', wdir='C:/dev/...')
__main__
a
1255 1733
3372 11015
5318 4571
7076 14510
10545 10749
3340 483
11844 3736
3681 14509
2222 1041
3640 11014
4288 7852
12257 1040
2101 11034
14938 3065
8449 1842
7231 10746
7509 4353
4898 3797
2941 866
7497 14520
8302 11013
13882 9924
12007 1042
1567 10747
13135 7856
7742 485
13709 12571
1946 11012
5634 7848
7044 4354
...
3441 14213
179 14361
6723 12134
7528 5905
9273 12420
9916 3614
134 10166
11654 5854
11848 12133
14055 4278
6100 14360
726 14981
13139 14982
12552 14983
5393 14984
6927 14986
8108 14985
12665 14987
8587 14988
11437 14989
4191 14990
6877 14991
4997 14994
13527 14995
9477 14993
2930 14996
5456 14992
781 14997
3287 14998
13386 14999
[15000 rows x 1 columns]
analysis_helper的结果:
runfile('C:/dev/.../test_parallelizer_pandas.py', wdir='C:/dev/...')
__main__
Traceback (most recent call last):
File "<ipython-input-7-7d6a88ec5a87>", line 1, in <module>
runfile('C:/dev/.../test_parallelizer_pandas.py', wdir='C:/dev/...')
File "C:\Users\david\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\Users\david\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/dev/.../test_parallelizer_pandas.py", line 42, in <module>
main()
File "C:/dev/.../test_parallelizer_pandas.py", line 35, in main
r = future.result()
File "C:\Users\david\Anaconda3\lib\concurrent\futures\_base.py", line 425, in result
return self.__get_result()
File "C:\Users\david\Anaconda3\lib\concurrent\futures\_base.py", line 384, in __get_result
raise self._exception
BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
注意:这仅在ProcessPoolExecutor中发生,而不在ThreadPoolExecutor中发生。