以数据框为输入并行化功能

时间:2018-12-12 20:46:19

标签: python-3.x pandas parallel-processing

如何在以下代码中并行化 func1 。它以数据帧作为输入。 func1 是我尝试并行化的函数的原型。在实函数中,有许多数据帧和系列作为函数中的输入。

import pandas as pd

def func1(d,a):
    product = d.prod(axis=1)*a
    print(product)

#input 1
a1=2    
input1 = pd.DataFrame({'F':[2,3,4,5,1,2], 'E':[12,4,5,6,7,2], 'N':[2,7,6,5,4,3]}) 

#input 2
a2 = 0
input2 = pd.DataFrame({'F':[0,32,4,12,1,2], 'E':[1,4,5,0,7,2], 'N':[21,7,61,5,4,3]})

#input3
a3=100
input3 = pd.DataFrame({'F':[0,1,1,1,1,1], 'E':[1,12,5,110,7,2], 'N':[3,7,61,5,1,1]})

#call function
func1(input1,a1)
func1(input2,a2)
func1(input3,a3)

1 个答案:

答案 0 :(得分:2)

只需使用内置的库多处理程序即可。我已经编辑了您的代码以同时在多个内核上工作。

import pandas as pd
from multiprocessing import Pool, cpu_count

CORE_NUMBER = cpu_count()

def func1(d, a):
    product = d.prod(axis=1)*a
    print(product)

# input 1
a1=2    
input1 = pd.DataFrame({'F':[2,3,4,5,1,2], 'E':[12,4,5,6,7,2], 'N':[2,7,6,5,4,3]}) 

# input 2
a2 = 0
input2 = pd.DataFrame({'F':[0,32,4,12,1,2], 'E':[1,4,5,0,7,2], 'N':[21,7,61,5,4,3]})

# input 3
a3=100
input3 = pd.DataFrame({'F':[0,1,1,1,1,1], 'E':[1,12,5,110,7,2], 'N':[3,7,61,5,1,1]})

data = [(input1,a1),(input2,a2),(input3,a3)]
pool = Pool(CORE_NUMBER) 

# call function
pool.starmap(func1, data)
pool.close()
pool.join()