逐字处理文本时,多处理要比顺序处理慢

时间:2018-07-04 07:50:28

标签: python multiprocessing python-multiprocessing python-multithreading

我需要逐字处理文本。由于我编写的顺序程序非常慢,因此我尝试使用多处理库对其进行编码。我发现多处理软件比顺序软件要慢得多。使用Pool函数时,代码中是否缺少某些内容? do_something函数执行许多fors和ifs。

顺序代码:

class Text():
    def do_something(self, word):
        ....
        # Computational heavy code
        ....
        return new_word
....
new_text = []
for sentence in text:
    new_sentence = []
    for word in sentence:
        ....
        new_word = Text().do_something(word)
        new_sentence += new_word
    new_text.append(new_sentence)
print(new_text)

多进程代码:

class Text():
    def do_something(self, word):
        ....
        # Computational heavy code
        ....
        return new_word

    def do_word(self, word):
        ....
        if len(word) > 2:
            return self.do_something(word).split('$')
        else:
            return ['NONE']

    def do_text(self, text):
        new_text = []
        pool = Pool(processes = cpu_count())   

        for sentence in text:
            new_text.append( [item for sublist in pool.map(self.do_word, sentence.split()) for item in sublist if item != 'NONE'] )
        return new_text

if __name__ == "__main__":
    ....
    print(Text().text(file))

编辑

根据Panagiotis Kanavos的建议,我尝试实现多线程而不是多处理。但是,运行下面的代码,该机器似乎仅使用一个内核(cpu的使用率约为25%,而我有4内核的cpu)。速度似乎与使用顺序代码所获得的速度相同(它也具有25%的CPU使用率)。

from multiprocessing.dummy import Pool as ThreadPool 

class Text():
    def do_something(self, word):
        ....
        # Computational heavy code
        ....
        return new_word

    def do_word(self, word):
        ....
        if len(word) > 2:
            return self.do_something(word).split('$')
        else:
            return ['NONE']

    def do_text(self, text):
        new_text = []
        pool = ThreadPool(processes = cpu_count())   

        for sentence in text:
            new_text.append( [item for sublist in pool.map(self.do_word, sentence.split()) for item in sublist if item != 'NONE'] )
        return new_text

if __name__ == "__main__":
    ....
    print(Text().text(file))

0 个答案:

没有答案