具有多个参数的python多处理

时间:2016-11-10 08:39:30

标签: python multiprocessing

我试图对一个大型文件执行多个操作的函数进行多处理,但我在使用pickling时遇到了已知的partial错误。

该功能如下所示:

def process(r,intermediate_file,record_dict,record_id):

    res=0

    record_str = str(record_dict[record_id]).upper()
    start = record_str[0:100]
    end= record_str[len(record_seq)-100:len(record_seq)]

    print sample, record_id
    if r=="1":

        if something:
            res = something...
            intermediate_file.write("...")

        if something:
            res = something
            intermediate_file.write("...")



    if r == "2":
        if something:
            res = something...
            intermediate_file.write("...")

        if something:
            res = something
            intermediate_file.write("...")

    return res

我在另一个函数中调用它的方式如下:

def call_func():
    intermediate_file = open("inter.txt","w")
    record_dict = get_record_dict()                 ### get infos about each record as a dict based on the record_id
    results_dict = {}  
    pool = Pool(10)
    for a in ["a","b","c",...]:

        if not results_dict.has_key(a):
            results_dict[a] = {}

        for b in ["1","2","3",...]:

            if not results_dict[a].has_key(b):
                results_dict[a][b] = {}


            results_dict[a][b]['res'] = []

            infile = open(a+b+".txt","r")
            ...parse the file and return values in a list called "record_ids"...

            ### now call the function based on for each record_id in record_ids
            if b=="1":
                func = partial(process,"1",intermediate_file,record_dict)
                res=pool.map(func, record_ids)
                ## append the results for each pair (a,b) for EACH RECORD in the results_dict 
                results_dict[a][b]['res'].append(res)

            if b=="2":
                func = partial(process,"2",intermediate_file,record_dict)
                res = pool.map(func, record_ids)
                ## append the results for each pair (a,b) for EACH RECORD in the results_dict
                results_dict[a][b]['res'].append(res) 

    ... do something with results_dict...

我的想法是,对于record_ids中的每条记录,我想保存每对(a,b)的结果。

我不确定是什么给了我这个错误:

  File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 558, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function faile

d

1 个答案:

答案 0 :(得分:0)

func未定义在代码的顶层,因此无法进行腌制。 您可以使用pathos.multiprocesssing,它不是标准模块,但可以使用。

或者,使用与Pool.map不同的东西可能是工人队列? https://docs.python.org/2/library/queue.html

最后有一个你可以使用的例子,它适用于threading但与multiprocessing非常相似,那里也有队列......

https://docs.python.org/2/library/multiprocessing.html#pipes-and-queues