python imap多处理导致执行速度变慢

时间:2014-05-11 10:35:38

标签: python python-2.7 multiprocessing

我正在对一堆文本文件进行一些预处理,当我为它做了imap版本时,最终结果甚至比正常的顺序执行慢。

def process_text((document)):
    #doing_some_more_preprocessing_on_text
    # extract named entities
    return named_entities

def input_files(dir, fname):
    # read each document as one big string
    document = open(os.path.join(dir, fname)).read()
    #tokenise here
    yield tokenize(document)

def corpus_preprocessing(top_dir):
    # save to another directory, if not exist, create one.
    if not os.path.exists(top_dir+'_pre'):
        os.makedirs(top_dir+'_pre')

    # real multiprocessing starts here
    for fname in os.listdir(top_dir):
        for named_entities in pool.imap(process_text, input_files(top_dir, fname)):
            with open(os.path.join(top_dir+'_pre', fname),'w') as handle:
                json.dump(named_entities, handle)


    pool.terminate()

# initialize pool
pool = Pool(multiprocessing.cpu_count())

# let's calculate time
now = time.time()

#provide the path to dataset here
top_dir = '/home/dataset/sample'

corpus_preprocessing(top_dir)

# print total time taken
print "Finished in", time.time()-now , "sec"

我也启用了,记录:

with imap:
[WARNING/MainProcess] doomed
[WARNING/MainProcess] doomed
[WARNING/MainProcess] doomed
[WARNING/MainProcess] doomed
[WARNING/MainProcess] doomed
[WARNING/MainProcess] doomed
Finished in 29.0439419746 sec

with single file at a time execution:
Finished in 18.4209680557 sec

“注定”消息是否意味着所有进程快速死亡,只留下一个进程来完成所有事情?有什么建议我在处理多处理时有什么问题吗?

0 个答案:

没有答案