Question

我需要尝试实现的功能方面的帮助，不幸的是，我对多线程不是很满意。

我的脚本从Internet下载4个不同的文件，并为每个文件调用专用功能，然后保存所有文件。问题是我正在逐步执行此操作，因此必须等待每次下载完成才能继续进行下一个下载。

我知道应该怎么做才能解决这个问题，但是我没有成功编写代码。

实际行为：

url_list = [Url1, Url2, Url3, Url4]
files_list = []

files_list.append(downloadFile(Url1))
handleFile(files_list[-1], type=0)
...
files_list.append(downloadFile(Url4))
handleFile(files_list[-1], type=3)
saveAll(files_list)

所需的行为：

url_list = [Url1, Url2, Url3, Url4]
files_list = []

for url in url_list:
    callThread(files_list.append(downloadFile(url)),             # function
               handleFile(files_list[url.index], type=url.index) # trigger
    #use a thread for downloading
    #once file is downloaded, it triggers his associated function
#wait for all files to be treated
saveAll(files_list)

感谢您的帮助！

Answer 1

典型的方法是将IO的重要部分（如通过互联网获取数据和数据处理）置于相同的功能中：

import random
import threading
import time
from concurrent.futures import ThreadPoolExecutor

import requests


def fetch_and_process_file(url):
    thread_name = threading.currentThread().name

    print(thread_name, "fetch", url)
    data = requests.get(url).text

    # "process" result
    time.sleep(random.random() / 4)  # simulate work
    print(thread_name, "process data from", url)

    result = len(data) ** 2
    return result


threads = 2
urls = ["https://google.com", "https://python.org", "https://pypi.org"]

executor = ThreadPoolExecutor(max_workers=threads)
with executor:
    results = executor.map(fetch_and_process_file, urls)

print()
print("results:", list(results))

输出：

ThreadPoolExecutor-0_0 fetch https://google.com
ThreadPoolExecutor-0_1 fetch https://python.org
ThreadPoolExecutor-0_0 process data from https://google.com
ThreadPoolExecutor-0_0 fetch https://pypi.org
ThreadPoolExecutor-0_0 process data from https://pypi.org
ThreadPoolExecutor-0_1 process data from https://python.org

如何同时下载多个文件并为每个完成的操作触发特定操作？

1 个答案: