我编写了一个程序,用于从给定的网站列表(100个链接)中删除信息。目前,我的程序按顺序执行此操作;也就是说,一次检查一个。我的程序的骨架如下。
for j in range(len(num_of_links)):
try: #if error occurs, this jumps to next of the list of website
site_exist(j) #a function to check if site exists
get_url_with_info(j) #a function to get links inside the website
except Exception as e:
print(str(e))
filter_result_info(links_with_info) #function that filters result
毋庸置疑,这个过程非常缓慢。因此,是否可以实现线程,使得我的程序可以更快地处理作业,使得4个并发作业每个都刮掉链接列表25。你能指出我如何做到这一点的参考吗?
答案 0 :(得分:1)
你想要的是Pool of threads。
from concurrent.futures import ThreadPoolExecutor
def get_url(url):
try:
if site_exists(url):
return get_url_with_info(url)
else:
return None
except Exception as error:
print(error)
with ThreadPoolExecutor(max_workers=4) as pool:
future = pool.map(get_url, list_of_urls)
list_of_results = future.results() # waits until all URLs have been retrieved
filter_result_info(list_of_results) # note that some URL might be None
答案 1 :(得分:0)
线程不会加快速度。多处理可能就是你想要的。