我正在处理Jira更改日志历史数据,并且由于大量数据以及大多数处理时间都是基于I / O的事实,我认为异步方法可能运行良好。
我有一个所有issue_id
的列表,我正在通过jira-python
api提供请求的函数,将信息提取到dict
,并且然后通过传递DictWriter
将其写出来。为了使它成为线程安全的,我从Lock()
模块导入了threading
,我也传入了它。在测试时,它似乎在某个点上陷入僵局并且只是挂起。我在文档中注意到它说如果任务彼此依赖,那么它们就可以挂起,我想它们是由于我正在实现的锁定。我怎样才能防止这种情况发生?
以下是我的参考代码:
(在代码的这一点上有一个名为keys
的列表,其中包含所有的issue_id)
def write_issue_history(
jira_instance: JIRA,
issue_id: str,
writer: DictWriter,
lock: Lock):
logging.debug('Now processing data for issue {}'.format(issue_id))
changelog = jira_instance.issue(issue_id, expand='changelog').changelog
for history in changelog.histories:
created = history.created
for item in history.items:
to_write = dict(issue_id=issue_id)
to_write['date'] = created
to_write['field'] = item.field
to_write['changed_from'] = item.fromString
to_write['changed_to'] = item.toString
clean_data(to_write)
add_etl_fields(to_write)
print(to_write)
with lock:
print('Lock obtained')
writer.writerow(to_write)
if __name__ == '__main__':
with open('outfile.txt', 'w') as outf:
writer = DictWriter(
f=outf,
fieldnames=fieldnames,
delimiter='|',
extrasaction='ignore'
)
writer_lock = Lock()
with ThreadPoolExecutor(max_workers=5) as exec:
for key in keys[:5]:
exec.submit(
write_issue_history,
j,
key,
writer,
writer_lock
)
编辑:我也很可能受到Jira API的限制。
答案 0 :(得分:1)
您需要将exec
的结果存储到列表中,通常命名为futs
,然后循环遍历该列表,调用result()
以获取其结果,处理可能存在的任何错误发生了。
(我也有机会exec
到executor
,因为这更传统,它可以避免覆盖内置的内容。
from traceback import print_exc
...
with ThreadPoolExecutor(max_workers=5) as executor:
futs = []
for key in keys[:5]:
futs.append( executor.submit(
write_issue_history,
j,
key,
writer,
writer_lock)
)
for fut in futs:
try:
fut.result()
except Exception as e:
print_exc()