Question

这个想法很简单：我需要并行发送多个HTTP请求。

我决定使用requests-futures库，这基本上产生了多个线程。

现在，我有大约200个请求，它仍然很慢（在我的笔记本电脑上大约需要12秒）。我也使用回调来解析响应json（如库文档中所建议的）。此外，是否有一个经验法则可以根据请求数求出最佳线程数，是否有？

基本上，我想知道我是否可以进一步加快这些要求。

Answer 1

由于您使用的是python 3.3，我建议您使用@ njzk2：concurrent.futures在链接线程中找不到的stdlib解决方案。

这是一个更高级别的交互，而不仅仅是直接处理threading或multiprocessing原语。您将获得一个Executor接口来处理池化和异步报告。

文档有一个基本上直接适用于您的情况的示例，因此我将其放在此处：

import concurrent.futures
import urllib.request

URLS = #[some list of urls]

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result() 
            # do json processing here
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

如果您愿意，可以使用urllib.request来电替换requests来电。我显然更喜欢requests，原因显而易见。

API有点像这样：制作一堆代表函数异步执行的Future个对象。然后使用concurrent.futures.as_completed为Future实例提供迭代器。它会在完成后产生它们。

关于你的问题：

此外，是否有一个经验法则可以找出最佳数量线程作为请求数量的函数，有没有？

经验法则，没有。这取决于太多的东西，包括你的互联网连接的速度。我会说它并不真正取决于您的请求数量，更多地取决于您运行的硬件。

幸运的是，调整max_workers kwarg并自行测试非常容易。从5或10个线程开始，以5为增量上升。您可能会注意到某些时候性能趋于稳定，然后随着添加额外线程的开销超过增加并行化的边际增益（这是一个单词）而开始减少

在Python 3中发送多个HTTP请求的最佳方法是什么？

1 个答案: