我正在尝试使用多重处理,以便从数据帧列表中的每个df中过滤掉一些内容:
# Accepts and returns a list of dataframes
dataframes = myClass.filter_calibration(dataframes)
在myClass中:
self.n_cores = -2
self.backend = 'loky'
def filter_calibration(self, dataframes, verbose=True):
results = Parallel(n_jobs=self.n_cores, backend=self.backend, verbose=verbose)(
delayed(helperFunctions.filter_calibration_helper)(df) for df in dataframes)
return results
在带有辅助函数的单独的.py文件中(不在类中)
def filter_calibration_helper(df):
if True in df['Calibrating'].unique():
calib = np.array(df['Calibrating'])
valids = np.array(df['Valid'])
calib_indices = np.argwhere(calib == True)
valids[calib_indices] = False
df['Valid'] = valids
else:
calib = np.array(df['Calibrating'])
valids = np.array(df['Valid'])
calib_indices = np.argwhere(calib == 1)
valids[calib_indices] = False
df['Valid'] = valids
return df
但是,我不断收到错误消息:
File "/Users/ima/example_script.py", line 45, in <module>
dataframes = myClass.filter_calibration(dataframes)
File "/Users/ima/myClass.py", line 59, in filter_calibration
delayed(helperFunctions.filter_calibration_helper)(df) for df in zip(dataframes))
File "/Users/ima/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 934, in __call__
self.retrieve()
File "/Users/ima/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/Users/ima/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
return future.result(timeout=timeout)
File "/Users/ima/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/Users/ima/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
对于我的一生,我不知道出了什么问题。唯一的参数是一个数据帧,该数据帧在我的代码中的其他一些函数中与Parallel一起正常工作。
我也尝试过joblib的multiprocessing
后端,但这停滞了。
感谢所有帮助!