我正在尝试编写一个通过数据库条目列表运行的应用程序,与那些进行API调用,返回值,如果5个调用的API JSON响应的一个值为True
,我想拥有这5个电话的清单。由于数据库条目是几千个条目,我想用multiprocessing
实现这一点。但我是一个并行化的初学者,似乎我无法掌握它的工作原理以及如何设置退出条件。这就是我得到的:
from multiprocessing.dummy import Pool
import requests
def get_api_response(apikey, result, subscription_id):
r = requests.get("https://api.example.com/" + subscription_id)
if r.json()['subscribed'] == True:
result.append(r.json())
return result
def pass_args(args):
foo = get_api_response(*args)
if foo:
return foo
def check_response_amount(result):
if len(result) >= 5:
pool.terminate()
# One entry looks like that: {"id": 1, "name": "smith", "subscription_id": 123}
db_entries = get_db_entries()
apikey = 'abcd1234'
result = []
request_tuples = [(apikey, result, entry['subscription_id']) for entry in db_entries]
pool = Pool(5)
pool_result = pool.map_async(pass_args, request_tuples, callback=check_response_amount)
pool_result.wait()
pool.close()
pool.join()
应用程序检查每个数据库条目并返回每个具有subscribed == True
的api响应,甚至没有运行回调。我尝试应用另一个问题(Python Multiprocessing help exit on condition)的答案,但无法使其发挥作用。有人可以帮帮我吗?
答案 0 :(得分:1)
当您使用map_async
时,回调将无法执行,直到迭代中的每个工作项都完成为止。如果您希望对request_tuples
中的每个项执行回调,而不是仅在完成所有项后执行,则需要在for循环内使用apply_async
:
results = []
for item in request_tuples:
results.append(pool.apply_async(get_api_response, args=item, callback=check_response_amount))
for result in results:
result.wait()
此外,致电pool.terminate
并不能按照您想要的方式工作;您已经提交到池中的项目将在您调用后永久挂起,这将使您的脚本挂起,因为您在退出之前等待它们完成。您可以通过等待池加入来解决这个问题,而不是实际等待任何单个任务完成。
import time
from multiprocessing.dummy import Pool
from multiprocessing.pool import TERMINATE
def get_api_response(apikey, result, subscription_id):
url = ("https://api.example.com/" + str(subscription_id))
time.sleep(2)
result.append(url)
return result
def pass_args(args):
foo = get_api_response(*args)
if foo:
return foo
def check_response_amount(result):
if result and len(result) >= 5:
print("DONE %s" % result)
pool.terminate()
def get_db_entries():
return [{'subscription_id' : i} for i in range(100)]
# One entry looks like that: {"id": 1, "name": "smith", "subscription_id": 123}
db_entries = get_db_entries()
apikey = 'abcd1234'
result = []
request_tuples = [(apikey, result, entry['subscription_id']) for entry in db_entries]
pool = Pool(2)
results = []
for item in request_tuples:
results.append(pool.apply_async(get_api_response, item, callback=check_response_amount))
pool.close()
pool.join()
print("done")
输出:
IN HERE
IN HERE
IN HERE
IN HERE
IN HERE
... (a bunch more of this)...
IN HERE
IN HERE
DONE ['https://api.example.com/1', 'https://api.example.com/0', 'https://api.example.com/2', 'https://api.example.com/3', 'https://api.example.com/4', 'https://api.example.com/5']
done
请注意,result
列表可能会比您想要的要大一些,因为terminate
调用实际上不会停止正在进行的任务。