我有一个功能,
def Recommendation(df1,df2,array1,array2):
for i in range(len(df1)):
. #do something
.
.
result = {} #The result is a dictionary and is inserted into mongodb
db.collectionname.insert_one(result)
df1数据很大,要花很多时间才能完成循环执行。如何使用多个参数在python中并行化此过程?
def parallelize_dataframe(df,func):
df_split = np.array_split(df, num_partitions)
pool = Pool(num_cores)
pool.starmap(func,df_split)
pool.close()
pool.join()
num_partitions = 5
num_cores = 5
if __name__ == '__main__':
df1= pd.read_csv("filename.csv")
df2= pd.read_csv("filename2.csv")
array1 = numpy.array1 #Loading numpy array
array2 = numpy.array2
parallelize_dataframe(df1, Recommendation)
此过程非常缓慢。我不确定工作是否分布在各个核心上。请帮助。