Python期货和加载列表到threadpoolexecutor

时间:2019-02-08 23:35:09

标签: python multithreading redis concurrent.futures

我很难在概念上解决这个问题...如果工作失败,我不能简单地将其重新添加到期货中。所以我正在寻找一个更简单的python多线程进程。

摘要:

我对并发期货的实现要求您在指定数量的线程上加载列表,并将其传递给要提交的执行者。该程序从redis加载大量URL,并将其写出DOM到文件中。

整个redis集将作为列表读入return_contents()

URLs = return_contents()
# print(URLs)
print('complete')

start = time.time()

# passing the urls to the threadpoolexecutor
with ThreadPoolExecutor(max_workers=20) as executor:
    future_to_url = {executor.submit(load_url, url): url for url in URLs}
    print('jobs are loaded')
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        completed = return_to_orig_names(url)
        name_check = check_ismemeber_crawled(completed)
        if name_check == 1:
            print('############################## we are skipping {0} because it is in crawled urls'.format(completed))
            pass
        else:
            try:
                data = future.result()
                html_data = str(data)
                add_url_es(url, html_data, es)
                add_completed_to_redis(completed)
            except Exception as exc:
                # if there is a max retry error then we need to add them to a different set in redis or to a file.
                # we need to add this to a different set in redis
                print('%r generated an exception: %s' % (url, exc))
                print('we are going to add this back into the queue')
        del future_to_url[future]

end = time.time()
print(end - start)

我正在寻找更像这样的东西...

伪代码:

如果设置了URL,请执行以下操作:     将作业分配给线程1     将作业分配给thread2     将作业分配给thread3     ...     将作业分配给线程20。     如果作业返回有效结果:         从Redis中删除     其他:         重新添加到redis

类似的东西必须存在...

0 个答案:

没有答案