Python 3:具有退出条件的多处理API调用

时间:2015-04-14 17:14:18

标签: python python-3.x multiprocessing python-multithreading python-multiprocessing

我正在尝试编写一个通过数据库条目列表运行的应用程序,与那些进行API调用,返回值,如果5个调用的API JSON响应的一个值为True,我想拥有这5个电话的清单。由于数据库条目是几千个条目,我想用multiprocessing实现这一点。但我是一个并行化的初学者,似乎我无法掌握它的工作原理以及如何设置退出条件。这就是我得到的:

from multiprocessing.dummy import Pool
import requests

def get_api_response(apikey, result, subscription_id):
    r = requests.get("https://api.example.com/" + subscription_id)
    if r.json()['subscribed'] == True:
        result.append(r.json())
        return result

def pass_args(args):
    foo = get_api_response(*args)
    if foo:
        return foo

def check_response_amount(result):
    if len(result) >= 5:
        pool.terminate() 

# One entry looks like that: {"id": 1, "name": "smith", "subscription_id": 123}
db_entries = get_db_entries()
apikey = 'abcd1234'
result = []
request_tuples = [(apikey, result, entry['subscription_id']) for entry in db_entries]
pool = Pool(5)
pool_result = pool.map_async(pass_args, request_tuples, callback=check_response_amount)
pool_result.wait()
pool.close()
pool.join()

应用程序检查每个数据库条目并返回每个具有subscribed == True的api响应,甚至没有运行回调。我尝试应用另一个问题(Python Multiprocessing help exit on condition)的答案,但无法使其发挥作用。有人可以帮帮我吗?

1 个答案:

答案 0 :(得分:1)

当您使用map_async时,回调将无法执行,直到迭代中的每个工作项都完成为止。如果您希望对request_tuples中的每个项执行回调,而不是仅在完成所有项后执行,则需要在for循环内使用apply_async

results = []
for item in request_tuples:
    results.append(pool.apply_async(get_api_response, args=item, callback=check_response_amount))

for result in results:
    result.wait()

此外,致电pool.terminate并不能按照您想要的方式工作;您已经提交到池中的项目将在您调用后永久挂起,这将使您的脚本挂起,因为您在退出之前等待它们完成。您可以通过等待池加入来解决这个问题,而不是实际等待任何单个任务完成。

import time
from multiprocessing.dummy import Pool
from multiprocessing.pool import TERMINATE

def get_api_response(apikey, result, subscription_id):
    url  = ("https://api.example.com/" + str(subscription_id))
    time.sleep(2)
    result.append(url)
    return result

def pass_args(args):
    foo = get_api_response(*args)
    if foo:
        return foo

def check_response_amount(result):
    if result and len(result) >= 5:
        print("DONE %s" % result)
        pool.terminate()


def get_db_entries():
    return [{'subscription_id' : i} for i in range(100)]

# One entry looks like that: {"id": 1, "name": "smith", "subscription_id": 123}
db_entries = get_db_entries()
apikey = 'abcd1234'
result = []
request_tuples = [(apikey, result, entry['subscription_id']) for entry in db_entries]
pool = Pool(2)
results = []
for item in request_tuples:
    results.append(pool.apply_async(get_api_response, item, callback=check_response_amount))
pool.close()
pool.join()
print("done")

输出:

IN HERE
IN HERE
IN HERE
IN HERE
IN HERE
... (a bunch more of this)...
IN HERE
IN HERE
DONE ['https://api.example.com/1', 'https://api.example.com/0', 'https://api.example.com/2', 'https://api.example.com/3', 'https://api.example.com/4', 'https://api.example.com/5']
done

请注意,result列表可能会比您想要的要大一些,因为terminate调用实际上不会停止正在进行的任务。