Question

我在本地计算机（Mac）上使用Python（IPython＆amp; Canopy）和RESTful内容API。

我有一组3000个唯一ID来从API中提取数据，并且一次只能使用一个ID调用API。

我希望能够以某种方式同时制作3组1000个电话以加快速度。

这样做的最佳方式是什么？

提前感谢您的帮助！

Answer 1

如果没有关于你正在做什么的更多信息，很难肯定地说，但一个简单的线程方法可能有意义。

假设您有一个处理单个ID的简单函数：

import requests

url_t = "http://localhost:8000/records/%i"

def process_id(id):
    """process a single ID"""
    # fetch the data
    r = requests.get(url_t % id)
    # parse the JSON reply
    data = r.json()
    # and update some data with PUT
    requests.put(url_t % id, data=data)
    return data

您可以将其扩展为处理一系列ID的简单函数：

def process_range(id_range, store=None):
    """process a number of ids, storing the results in a dict"""
    if store is None:
        store = {}
    for id in id_range:
        store[id] = process_id(id)
    return store

最后，您可以相当轻松地将子范围映射到线程上，以允许一些请求并发：

from threading import Thread

def threaded_process_range(nthreads, id_range):
    """process the id range in a specified number of threads"""
    store = {}
    threads = []
    # create the threads
    for i in range(nthreads):
        ids = id_range[i::nthreads]
        t = Thread(target=process_range, args=(ids,store))
        threads.append(t)

    # start the threads
    [ t.start() for t in threads ]
    # wait for the threads to finish
    [ t.join() for t in threads ]
    return store

IPython Notebook中的完整示例：http://nbviewer.ipython.org/5732094

如果您的个人任务花费的时间更广泛，您可能需要使用ThreadPool，这将分配一个作业（如果个别任务非常小，通常会更慢，但可以保证更好的平衡）在异质情况下）。

使用Python并行进行多个API调用（IPython）

1 个答案: