Question

我正在努力在python中使用current.futures。我试图遍历大量的S3对象。由于帐户，存储区和对象的数量，这可能需要很长时间。比我的STS凭据更长的时间对我有好处，而且我不相信该脚本的时间也不会中断。

我希望下面的代码能够正常工作，并且在生成少量存储桶进行测试时，它确实会产生我要查找的输出，它只会在完全处理完每个存储桶之后才写入已完成和输出的文件，而不是在每次以后都写入返回。如果被打断，则不会写入完整的文件。即使已经成功处理了许多存储桶。

if __name__ == '__main__':
    args_results = parser.parse_args()

    completed = open(args_results.completed, 'a+')
    out = open(args_results.out, 'a+')  

    done = getCompleted(args_results.completed) 
    todo = getBuckets(args_results.todo)

    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = []
        for item in todo: 
            if item not in done:
                account, bucket = item.split('|')
                futures.append(executor.submit(getBucketInfo, account, bucket))

        for x in as_completed(futures):
            result = x.result()
            out.write(result + '\n')
            completed.write(result['Account'] + '|' + result['Bucket'] + '\n')

我误解了as_completed（）功能应该如何工作？

Answer 1

打开文件时，我需要添加行缓冲，以便每写入一行就刷新到磁盘。

completed = open(args_results.completed, 'a+', buffering=1)
out = open(args_results.out, 'a+', buffering=1)

问题解决了。

使用python在current.futures中使用as_completed的异常行为。

1 个答案: