python中的多进程只使用一个进程

时间:2016-04-26 17:16:42

标签: python multiprocessing python-multiprocessing

我正在尝试使用python学习多处理。 我编写了一个简单的代码,它应该从txt输入文件中为每个进程提供1000行代码。我的main函数读取一行,将其拆分,然后对字符串中的元素执行一些非常简单的操作。最终结果应该写在输出文件中。

当我运行它时,正确生成了4个进程,但实际上只有一个进程以最小的CPU运行。因此,代码非常慢,并且首先违背了使用多处理的目的。 我认为我没有像这个问题(python multiprocessing apply_async only uses one process)那样的全局列表问题,我认为我的函数不像在这种情况下那样微不足道(Python multiprocessing.Pool() doesn't use 100% of each CPU)。

我无法理解我做错了什么,感谢任何帮助/建议。这是基本代码:

import multiprocessing
import itertools

def myfunction(line):
        returnlist=[]
        list_of_elem=line.split(",")
        elem_id=list_of_elem[1]
        elem_to_check=list_of_elem[5]

        ids=list_of_elem[2].split("|")

        for x in itertools.permutations(ids,2):
                if x[1] == elem_to_check:
                            returnlist.append(",".join([elem_id,x,"1\n"]))
                else:
                            returnlist.append(",".join([elem_id,x,"0\n"]))

        return returnlist       

def grouper(n, iterable, padvalue=None):
    return itertools.izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

if __name__ == '__main__':
    my_data = open(r"my_input_file_to_be_processed.txt","r")
    my_data = my_data.read().split("\n")   

    p = multiprocessing.Pool(4)

    for chunk in grouper(1000, my_data):
            results = p.map(myfunction, chunk)
            for r in results:
                with open (r"my_output_file","ab") as outfile:
                   outfile.write(r)

修改 我按照建议修改了我的代码(删除冗余数据预处理)。但问题似乎仍然存在。

import multiprocessing
import itertools

def myfunction(line):
        returnlist=[]
        list_of_elem=line.split(",")
        elem_id=list_of_elem[1]
        elem_to_check=list_of_elem[5]

        ids=list_of_elem[2].split("|")

        for x in itertools.permutations(ids,2):
                if x[1] == elem_to_check:
                            returnlist.append(",".join([elem_id,x,"1\n"]))
                else:
                            returnlist.append(",".join([elem_id,x,"0\n"]))

        return returnlist       

if __name__ == '__main__':
    my_data = open(r"my_input_file_to_be_processed.txt","r")

    p = multiprocessing.Pool(4)

    results = p.map(myfunction, chunk, chunksize=1000)
        for r in results:
            with open (r"my_output_file","ab") as outfile:
                outfile.write(r)

1 个答案:

答案 0 :(得分:0)

根据你的代码片段,我想我会做这样的事情,将文件分为8个部分,然后由4个工人完成计算(为什么8个块和4个工人?只是随机选择我为这个例子做了。):

from multiprocessing import Pool
import itertools

def myfunction(lines):
    returnlist = []
    for line in lines:
        list_of_elem = line.split(",")
        elem_id = list_of_elem[1]
        elem_to_check = list_of_elem[5]
        ids = list_of_elem[2].split("|")

        for x in itertools.permutations(ids,2):
            returnlist.append(",".join(
                [elem_id,x,"1\n" if x[1] == elem_to_check else "0\n"]))

    return returnlist

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(itertools.islice(it, size)), ())

if __name__ == "__main__":
    my_data = open(r"my_input_file_to_be_processed.txt","r")
    my_data = my_data.read().split("\n")   

    prep = [strings for strings in chunk(my_data, round(len(my_data) / 8))]
    with Pool(4) as p:
        res = p.map(myfunction, prep)

    result = res.pop(0)
    _ = list(map(lambda x: result.extend(x), res))
    print(result)  # ... or do something with the result

修改: 这是假设您确信所有行都以相同的方式格式化并且不会导致错误。

根据您的评论,通过在没有multiprocessing的情况下测试它或者以非常大/丑陋的方式使用try / except来查看函数/文件内容中的问题可能很有用确保将生成输出(异常或结果):

def myfunction(lines):
    returnlist = []
    for line in lines:
        try:
            list_of_elem = line.split(",")
            elem_id = list_of_elem[1]
            elem_to_check = list_of_elem[5]
            ids = list_of_elem[2].split("|")

            for x in itertools.permutations(ids,2):
                returnlist.append(",".join(
                    [elem_id,x,"1\n" if x[1] == elem_to_check else "0\n"]))
        except Exception as err:
            returnlist.append('I encountered error {} on line {}'.format(err, line))

    return returnlist