Python多处理。处理父进程中的异常并使所有子进程优雅地死亡

时间:2014-11-14 00:42:27

标签: python python-decorators python-multiprocessing

我有以下代码。

这使用一个名为decorator的python模块。

from multiprocessing import Pool
from random import randint
import traceback
import decorator
import time


def test_retry(number_of_retry_attempts=1, **kwargs):
    timeout = kwargs.get('timeout', 2.0) # seconds
    @decorator.decorator
    def tryIt(func, *fargs, **fkwargs):
        for _ in xrange(number_of_retry_attempts):
            try: return func(*fargs, **fkwargs)
            except:
                tb = traceback.format_exc()
                if timeout is not None:
                    time.sleep(timeout)
                print 'Catching exception %s. Attempting retry: '%(tb)

        raise
    return tryIt

装饰器模块帮助我装饰我的datawarhouse调用函数。因此,我不需要处理连接丢失和各种基于连接的问题,并允许我重置连接并在超时后重试。我用这种方法装饰了所有用数据仓库读取的函数,所以我免费重试。

我有以下方法。

def process_generator(data):
    #Process the generated data


def generator():
    data = data_warhouse_fetch_method()#This is the actual method which needs retry
    yield data

@test_retry(number_of_retry_attempts=2,timeout=1.0)
def data_warhouse_fetch_method():
    #Fetch the data from data-warehouse
    pass

我尝试使用这样的多处理模块对我的代码进行多处理。

try:
    pool = Pool(processes=2)
    result = pool.imap_unordered(process_generator,generator())
except Exception as exception:
    print 'Do some post processing stuff'
    tb = traceback.format_exc()
    print tb 

当一切都成功时,事情是正常的。当它在重试次数内修复时,事情也是正常的。但是一旦reties的数量超过i,就会在test_retry方法中引发异常,而不会在主进程中被捕获。进程终止,主进程分叉的进程被留作孤儿。可能是我在这里做错了。我正在寻找一些帮助来解决以下问题。将异常传播到父进程,以便我可以处理异常并让我的孩子优雅地死去。此外,我想知道如何通知子进程优雅地死亡。在此先感谢您的帮助 。

修改:添加了更多代码供您解释。

def test_retry(number_of_retry_attempts=1, **kwargs):
    timeout = kwargs.get('timeout', 2.0) # seconds
    @decorator.decorator
    def tryIt(func, *fargs, **fkwargs):
        for _ in xrange(number_of_retry_attempts):
            try: return func(*fargs, **fkwargs)
            except:
                tb = traceback.format_exc()
                if timeout is not None:
                    time.sleep(timeout)
                print 'Catching exception %s. Attempting retry: '%(tb)
        raise
    return tryIt

@test_retry(number_of_retry_attempts=2,timeout=1.0)
def bad_method():
    sample_list =[]
    return sample_list[0] #This will result in an exception


def process_generator(number):
    if isinstance(number,int):
        return number+1
    else:
        raise

def generator():
    for i in range(20):
        if i%10 == 0 :
         yield bad_method()
        else:
            yield i

try:
    pool = Pool(processes=2)
    result = pool.imap_unordered(process_generator,generator())
    pool.close()
    #pool.join()
    for r in result:
        print r
except Exception, e: #Hoping the generator will catch the exception. But not .
    print 'got exception: %r, terminating the pool' % (e,)
    pool.terminate()
    print 'pool is terminated'
finally:
    print 'joining pool processes'
    pool.join()
    print 'join complete'
print 'the end'

实际问题归结为如果生成器抛出异常,我无法捕获生成器在except子句中抛出的异常,该子句包含在pool.imap_unordered()方法中。因此,在抛出异常后,主进程被卡住并且子进程永远等待。不确定我在这里做错了什么。

1 个答案:

答案 0 :(得分:0)

我不完全理解这里分享的代码,因为我不是专家。此外,问题已近一年。但是我有与主题中解释的相同的要求。我设法找到了解决方案:

import multiprocessing
import time


def dummy(flag):
    try:
        if flag:
            print('Sleeping for 2 secs')
            time.sleep(2)  # So that it can be terminated
        else:
            raise Exception('Exception from ', flag) # To simulate termination
        return flag  # To check that the sleeping thread never returns this
    except Exception as e:
        print('Exception inside dummy', e)
        raise e
    finally:
        print('Entered finally', flag)


if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
    args_list = [(1,), (0,)]
    # call dummy for each tuple inside args_list. 
    # Use error_callback to terminate the pool
    results = pool.starmap_async(dummy, args_list, 
                                error_callback=lambda e, mp_pool=pool: mp_pool.terminate())
    pool.close()
    pool.join()
    try:
        # Try to see the results.
        # If there was an exception in any process, results.get() throws exception
        for result in results.get():
            # Never executed cause of the exception
            print('Printing result ', result)  
    except Exception as e:
        print('Exception inside main', e)

    print('Reached the end')

这会产生以下输出:

Sleeping for 2 secs
Exception inside dummy ('Exception from ', 0)
Entered finally 0
Exception inside main ('Exception from ', 0)
Reached the end

这是我第一次回答问题,所以如果我违反任何规则或犯了任何错误,我会提前道歉。

我试图做以下事情但没有成功:

  1. 使用apply_async。但是,在抛出异常后,这只是挂起了主进程
  2. 尝试使用error_callback
  3. 中的pid终止进程和子进程
  4. 在继续之前,使用multiprocessing.event跟踪异常并在每个步骤后检查所有进程中的异常。这不是一个好方法,但也不起作用:“条件对象只应通过继承在进程之间共享”
  5. 老实说,如果其中一个进程引发了异常,那么终止同一个池中的所有进程并不是那么困难。