我有一个python代码,需要对大约1MM的行进行一些处理。我们正在使用python的多处理模块
from multiprocessing import Pool
rows_per_workload = 250000
num_loads = math.ceil(data_df.shape[0]/float(rows_per_workload))
agents =num_loads
print ('Num Loads ', num_loads , ' DF Shape ', data_df.shape[0])
split_df = np.array_split(data_df, num_loads) # List of DFs
with Pool(processes=agents) as pool:
result = pool.starmap(func=work,iterable=zip(split_df, [skey_data, skey_data]))
result = pandas.concat(result)
def work(argdata, argmaster):
print('parent process:', os.getppid())
print('process id:', os.getpid())
即使我们期望从这一行产生4个进程
with Pool(processes=agents) as pool:
我们只看到2个进程。当我运行此代码时,我们只看到两个PID
parent process: 19652
process id: 12936
parent process: 19652
process id: 14836
我错过了什么..
此致
巴拉