如何使用锁而不会在concurrent.futures.ThreadPoolExecutor中导致死锁?

时间:2016-07-14 21:12:49

标签: python multithreading concurrent.futures

我正在处理Jira更改日志历史数据,并且由于大量数据以及大多数处理时间都是基于I / O的事实,我认为异步方法可能运行良好。

我有一个所有issue_id的列表,我正在通过jira-python api提供请求的函数,将信息提取到dict,并且然后通过传递DictWriter将其写出来。为了使它成为线程安全的,我从Lock()模块导入了threading,我也传入了它。在测试时,它似乎在某个点上陷入僵局并且只是挂起。我在文档中注意到它说如果任务彼此依赖,那么它们就可以挂起,我想它们是由于我正在实现的锁定。我怎样才能防止这种情况发生?

以下是我的参考代码:

(在代码的这一点上有一个名为keys的列表,其中包含所有的issue_id)

def write_issue_history(
        jira_instance: JIRA,
        issue_id: str,
        writer: DictWriter,
        lock: Lock):
    logging.debug('Now processing data for issue {}'.format(issue_id))
    changelog = jira_instance.issue(issue_id, expand='changelog').changelog

    for history in changelog.histories:
        created = history.created
        for item in history.items:
            to_write = dict(issue_id=issue_id)
            to_write['date'] = created
            to_write['field'] = item.field
            to_write['changed_from'] = item.fromString
            to_write['changed_to'] = item.toString
            clean_data(to_write)
            add_etl_fields(to_write)
            print(to_write)
            with lock:
                print('Lock obtained')
                writer.writerow(to_write)

if __name__ == '__main__':
    with open('outfile.txt', 'w') as outf:
                writer = DictWriter(
                    f=outf,
                    fieldnames=fieldnames,
                    delimiter='|',
                    extrasaction='ignore'
                )
                writer_lock = Lock()
                with ThreadPoolExecutor(max_workers=5) as exec:
                    for key in keys[:5]:
                        exec.submit(
                            write_issue_history,
                            j,
                            key,
                            writer,
                            writer_lock
                        )
编辑:我也很可能受到Jira API的限制。

1 个答案:

答案 0 :(得分:1)

您需要将exec的结果存储到列表中,通常命名为futs,然后循环遍历该列表,调用result()以获取其结果,处理可能存在的任何错误发生了。

(我也有机会execexecutor,因为这更传统,它可以避免覆盖内置的内容。

from traceback import print_exc

...

with ThreadPoolExecutor(max_workers=5) as executor:
    futs = []
    for key in keys[:5]:
        futs.append( executor.submit(
            write_issue_history,
            j,
            key,
            writer,
            writer_lock)
        )

for fut in futs:
    try:
        fut.result()
    except Exception as e:
        print_exc()