Python多处理:Pool.map()似乎根本不调用函数

时间:2016-12-15 16:39:12

标签: python windows multithreading

我对多线程很新,所以如果它是基本的我很抱歉。我有一些功能,OCRs图像文件,我想多线程的任务。该函数不返回任何内容,只保存OCR数据集的文本。代码如下:

start_time = time.time()
path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test'
listfiles = os.listdir(path)

filterfiles = [p for p in listfiles if p[-4:] == '.tif']

pool = Pool(processes=2)

result = pool.map(OCRimage,filterfiles)

pool.close()
pool.join()

print("--- %s seconds ---" % (time.time() - start_time))

当我运行代码时,似乎它会卡在pool.map()上。我跑了30分钟,这比试验过程花费的时间更长,而且单次输出没有产生。我测试了我的函数OCRimage,它似乎没有进入函数一次(使用print(1)作为我的OCRimage代码的第一行)。我想知道是否有人可以帮助我。谢谢,

卡梅伦

编辑(添加OCRimage功能):

OCRimage功能如下所示:

def OCRimage(f):
    #This runs the magick bash script which splits a multi-image tif into multiple single image tiffs
    process = subprocess.Popen(["magick", path + "\\" + f, path + "\\temp\\%d.tif"], shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    print(process.communicate()[0])

    #finds the number of pages for each tiff file (this might not be necassary but the all files in directory python command could access files randomly)
    max1 = -1
    for filename in os.listdir(path+'\\temp'):    
        if (max1 < int(filename[0:-4])):
            max1 = int(filename[0:-4])
    max1 = max1 + 1

    text = ""
    for each in range(0,max1):
        im = Image.open(path + "\\temp\\"+ str(each) + ".tif")
        text = text + pytesseract.image_to_string(im)
    with open(path + "\\result\\OCR-"+f[0:-4]+".txt", 'w') as file:
        file.write(text)    

    for f in os.listdir(path+'\\temp'):
        os.remove(path + '\\temp\\' + f)

Edit2:以下是所有导入

import time
import subprocess
import os
import pytesseract
from PIL import Image

from multiprocessing import Pool
import multiprocessing
countcpus = multiprocessing.cpu_count()

EDIT3:

单独运行OCRimage(f)本身运行正常。我只使用它来代替多线程代码:

path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test'
for p in os.listdir(path):
    OCRimage(p)

1 个答案:

答案 0 :(得分:0)

这是Minimal, Complete, and Verifiable Example,似乎表明问题必须出现在OCRimage函数中(请参阅下面的 Windows 部分了解真正的问题):

from multiprocessing import Pool

def OCRimage(file_name):
    print "file_name = %s" % file_name

filterfiles = ["image%03d.tif" % n for n in range(5)]

pool = Pool(processes=2)
result = pool.map(OCRimage, filterfiles)

pool.close()
pool.join()

<强>输出

file_name = image000.tif
file_name = image001.tif
file_name = image002.tif
file_name = image003.tif
file_name = image004.tif

我建议将这些更改添加到OCRimage

的开头
def OCRimage(file_name):
    print "file_name = %s" % file_name
    src = os.path.join([path, file_name])
    dst = os.path.join([path, 'temp', '%d.tif'])
    command_list = ['magick', src, dst]
    # This runs the magick bash script which splits a multi-image tif into
    # multiple single image tiffs
    process = subprocess.Popen(command_list,
                               shell=True,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    output, errors = process.communicate()
    if process.returncode != 0:
        print "Image processing failed for %s: %s" % (file_name, errors)
        return
    # The rest of your code goes here

验证子进程的返回码是零是很重要的。如果它不为零,你真的想看看errors字符串。

<强>窗

当我在Windows上运行mcve时,我遇到了这个例外:

RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main

当我将mcve更改为此时,它有效:

from multiprocessing import Pool

def OCRimage(file_name):
    print "file_name = %s" % file_name

def main():
    filterfiles = ["image%03d.tif" % n for n in range(5)]
    pool = Pool(processes=2)
    result = pool.map(OCRimage, filterfiles)
    pool.close()
    pool.join()

if __name__ == '__main__':
    main()
相关问题