我需要尝试实现的功能方面的帮助,不幸的是,我对多线程不是很满意。
我的脚本从Internet下载4个不同的文件,并为每个文件调用专用功能,然后保存所有文件。 问题是我正在逐步执行此操作,因此必须等待每次下载完成才能继续进行下一个下载。
我知道应该怎么做才能解决这个问题,但是我没有成功编写代码。
实际行为:
url_list = [Url1, Url2, Url3, Url4]
files_list = []
files_list.append(downloadFile(Url1))
handleFile(files_list[-1], type=0)
...
files_list.append(downloadFile(Url4))
handleFile(files_list[-1], type=3)
saveAll(files_list)
所需的行为:
url_list = [Url1, Url2, Url3, Url4]
files_list = []
for url in url_list:
callThread(files_list.append(downloadFile(url)), # function
handleFile(files_list[url.index], type=url.index) # trigger
#use a thread for downloading
#once file is downloaded, it triggers his associated function
#wait for all files to be treated
saveAll(files_list)
感谢您的帮助!
答案 0 :(得分:0)
典型的方法是将IO的重要部分(如通过互联网获取数据和数据处理)置于相同的功能中:
import random
import threading
import time
from concurrent.futures import ThreadPoolExecutor
import requests
def fetch_and_process_file(url):
thread_name = threading.currentThread().name
print(thread_name, "fetch", url)
data = requests.get(url).text
# "process" result
time.sleep(random.random() / 4) # simulate work
print(thread_name, "process data from", url)
result = len(data) ** 2
return result
threads = 2
urls = ["https://google.com", "https://python.org", "https://pypi.org"]
executor = ThreadPoolExecutor(max_workers=threads)
with executor:
results = executor.map(fetch_and_process_file, urls)
print()
print("results:", list(results))
输出:
ThreadPoolExecutor-0_0 fetch https://google.com
ThreadPoolExecutor-0_1 fetch https://python.org
ThreadPoolExecutor-0_0 process data from https://google.com
ThreadPoolExecutor-0_0 fetch https://pypi.org
ThreadPoolExecutor-0_0 process data from https://pypi.org
ThreadPoolExecutor-0_1 process data from https://python.org