不要使用Pool python打印堆栈跟踪

时间:2015-09-27 15:59:54

标签: python stack-trace pool

我使用Pool同时运行多个命令。我想在用户中断脚本时不打印堆栈跟踪。

这是我的脚本结构:

def worker(some_element):
    try:
        cmd_res = Popen(SOME_COMMAND, stdout=PIPE, stderr=PIPE).communicate()
    except (KeyboardInterrupt, SystemExit):
        pass
    except Exception, e:
        print str(e)
        return

    #deal with cmd_res...

pool = Pool()
try:
    pool.map(worker, some_list, chunksize = 1)
except KeyboardInterrupt:
    pool.terminate()
    print 'bye!'

pool.terminated()加注时调用KeyboardInterrupt,我希望不会打印堆栈跟踪,但它不起作用,我有时会

^CProcess PoolWorker-6:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
    racquire()
KeyboardInterrupt
Process PoolWorker-1:
Process PoolWorker-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):

...
bye!

你知道我怎么能隐瞒这个吗?

感谢。

4 个答案:

答案 0 :(得分:1)

当您实例化Pool时,它会创建cpu_count()(在我的机器上,8)等待worker()的python进程。请注意,他们还没有运行它们,他们正在等待命令。当他们不执行您的代码时,他们也不会处理KeyboardInterrupt。如果指定Pool(processes=2)并发送中断,您可以看到他们正在做什么。您可以使用流程编号来修复它,但我不认为您可以在所有情况下处理它。

我个人并不建议使用multiprocessing.Pool来启动其他进程。为此启动几个python进程实在是太过分了。更有效的方法 - 使用线程(请参阅threading.ThreadQueue.Queue)。但在这种情况下,您需要自己实现线程池。但这并不是那么难。

答案 1 :(得分:1)

在您的情况下,您甚至不需要池进程或线程。然后使用try-catch使KeyboardInterrupts变得更容易。

当您的Python代码执行可以从并行化中获利的CPU消耗计算时,池进程非常有用。 当您的Python代码执行可并行运行的复杂阻塞I / O时,线程非常有用。您只想并行执行多个程序并等待结果。当您使用Pool时,您创建的流程除了启动其他流程并等待它们终止之外什么都不做。

最简单的解决方案是并行创建所有流程,然后在每个流程上调用.communicate()

try:
    processes = []
    # Start all processes at once
    for element in some_list:
        processes.append(Popen(SOME_COMMAND, stdout=PIPE, stderr=PIPE))
    # Fetch their results sequentially
    for process in processes:
        cmd_res = process.communicate()
        # Process your result here
except KeyboardInterrupt:
    for process in processes:
        try:
            process.terminate()
        except OSError:
            pass

当STDOUT和STDERR上的输出不太大时,这种方法有效。当另一个进程不是一个communicate()正在运行时为PIPE缓冲区产生太多输出(通常大约1-8 kB)时,操作系统将暂停它,直到暂停时调用communicate()为止处理。在这种情况下,您需要一个更复杂的解决方案:

异步I / O

从Python 3.4开始,您可以使用asyncio模块进行单线程伪多线程:

import asyncio
from asyncio.subprocess import PIPE

loop = asyncio.get_event_loop()

@asyncio.coroutine
def worker(some_element):
    process = yield from asyncio.create_subprocess_exec(*SOME_COMMAND, stdout=PIPE)
    try:
        cmd_res = yield from process.communicate()
    except KeyboardInterrupt:
        process.terminate()
        return
    try:
        pass # Process your result here
    except KeyboardInterrupt:
        return

# Start all workers
workers = []
for element in some_list:
    w = worker(element)
    workers.append(w)
    asyncio.async(w)

# Run until everything complete
loop.run_until_complete(asyncio.wait(workers))

您应该能够使用例如限制并发进程的数量来限制asyncio.Semaphore如果您需要。{/ p>

答案 2 :(得分:0)

您的子进程将同时收到KeyboardInterrupt异常和if (RT == dummy) // use T determined from switch as RT else // use the specified RT 中的异常。

由于子进程收到KeyboardInterrupt,父进程中的简单terminate() - 而不是join() - 应该足够了。

答案 3 :(得分:0)

正如我所建议的那样,我使用threading.Thread代替Pool

这是一个工作示例,它使用ImageMagick光栅化一组矢量(我知道我可以使用mogrify,这只是一个例子)。

#!/usr/bin/python

from os.path import abspath
from os import listdir
from threading import Thread
from subprocess import Popen, PIPE

RASTERISE_CALL = "magick %s %s"
INPUT_DIR = './tests_in/'

def get_vectors(dir):
    '''Return a list of svg files inside the `dir` directory'''
    return [abspath(dir+f).replace(' ', '\\ ') for f in listdir(dir) if f.endswith('.svg')]

class ImageMagickError(Exception):
    '''Custom error for ImageMagick fails calls'''
    def __init__(self, value): self.value = value
    def __str__(self): return repr(self.value)

class Rasterise(Thread):
    '''Rasterizes a given vector.'''
    def __init__(self, svg):
        self.stdout = None
        self.stderr = None
        Thread.__init__(self)
        self.svg = svg

    def run(self):
        p = Popen((RASTERISE_CALL % (self.svg, self.svg + '.png')).split(), shell=False, stdout=PIPE, stderr=PIPE)
        self.stdout, self.stderr = p.communicate()
        if self.stderr is not '':
            raise ImageMagickError, 'can not rasterize ' + self.svg + ': ' + self.stderr

threads = []

def join_threads():
    '''Joins all the threads.'''
    for t in threads:
        try:
            t.join()
        except(KeyboardInterrupt, SystemExit):
            pass

#Rasterizes all the vectors in INPUT_DIR.
for f in get_vectors(INPUT_DIR):
    t = Rasterise(f)

    try:
        print 'rasterize ' + f
        t.start()
    except (KeyboardInterrupt, SystemExit):
        join_threads()
    except ImageMagickError:
        print 'Opps, IM can not rasterize ' + f + '.'
        continue

    threads.append(t)

# wait for all threads to end
join_threads()

print ('Finished!')

请告诉我,如果你认为有更多的pythonic方式,或者如果它可以优化,我会编辑我的答案。