Question

我编写了一个程序，用于从给定的网站列表（100个链接）中删除信息。目前，我的程序按顺序执行此操作;也就是说，一次检查一个。我的程序的骨架如下。

for j in range(len(num_of_links)):
    try: #if error occurs, this jumps to next of the list of website
        site_exist(j) #a function to check if site exists
        get_url_with_info(j) #a function to get links inside the website
    except Exception as e: 
        print(str(e))
filter_result_info(links_with_info) #function that filters result

毋庸置疑，这个过程非常缓慢。因此，是否可以实现线程，使得我的程序可以更快地处理作业，使得4个并发作业每个都刮掉链接列表25。你能指出我如何做到这一点的参考吗？

Answer 1

你想要的是Pool of threads。

from concurrent.futures import ThreadPoolExecutor


def get_url(url):
    try:
        if site_exists(url):
            return get_url_with_info(url)
        else:
            return None
    except Exception as error: 
        print(error)


with ThreadPoolExecutor(max_workers=4) as pool:
    future = pool.map(get_url, list_of_urls)

list_of_results = future.results()  # waits until all URLs have been retrieved
filter_result_info(list_of_results)  # note that some URL might be None

Answer 2

线程不会加快速度。多处理可能就是你想要的。

Multiprocessing vs Threading Python

线程加速从给定的网站列表中抓取数据

2 个答案: