我是python的新手,我写了一些代码来从Web API下载数据。但是,在使用API时我必须遵守一些限制:
向Web API发出请求的方法的代码是:
def getMatchDetails(self,match_id):
'''Calls the WEB Api and requests the data for the match with
a specific id (in match_id). Then returns the data already decoded
from json.'''
import urllib2
import json
import time
url = self.__makeUrl__(api_key= self.api_key, parameters = ['match_id='+str(match_id)])
# Sometimes a time out occurs, we keep trying
while True:
try:
start = time.time()
json_obj = urllib2.urlopen(url)
end = time.time()
if end - start < 1:
time.sleep(1 - (end - start))
except:
print('Timed Out, Trying again in 30 seconds')
time.sleep(30)
continue
else:
break
detailed_data = json.load(json_obj)
return detailed_data
方法 makeUrl 简单地连接一些字符串并返回它们。 并且为了在每次调用上述方法时更改API密钥,我使用:
def getMatchDetailsForMap(self,match_id):
self.counter += 1
self.api_key = self.api_keys[self.counter%len(self.api_keys)]
return self.getMatchDetails(match_id)
其中self.api_keys是包含所有API密钥的列表。 然后我在下面的代码中使用方法 getMatchDetailsForMap 和map函数:
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(14)
ids_to_get = self.__idsToGetChunks__(14)
for chunk in ids_to_get:
results = pool.map(self.getMatchDetailsForMap,chunk)
方法 idsToGetChunks 返回带有参数(match_id)的lits of lists(块),这些参数将被提供给getMatchDetailsForMap方法。
问题:
感谢阅读和帮助!对不起,很长的帖子。
答案 0 :(得分:0)
为了符合这三个要求,我建议编写一个简单的for
循环,每个循环执行一个请求。一般来说,等一秒钟。如果发生超时,请等待30秒。不要循环超过100k次。 (我假设这个脚本每天运行一次,并且需要不到24小时;))
主程序会为每个API密钥激活一个Process
。
简单!
# 1 request per second per API key
# If a timeout occurs, wait 30 seconds before trying again
# Limit of 100k requests per day per API key
import logging, time, urllib2
import multiprocessing as mp
def do_fetch(key, timeout):
return urllib2.urlopen(
'http://example.com', timeout=timeout
).read()
def get_data(api_key):
logger = mp.get_logger()
data = None
# Limit of 100k requests per day per API key
for num in range(100*1000):
t = 1 if num!=1 else 0 # test timeout exception
try:
data = do_fetch(api_key, timeout=t)
logger.info('%d bytes', len(data))
except urllib2.URLError as exc:
logger.error('exc: %s', repr(exc))
# If a timeout occurs, wait 30 seconds before trying again
time.sleep(3)
else:
# "1 request per second per API key"
time.sleep(1)
mp.log_to_stderr(level=logging.INFO)
keys = [123, 234]
pool = mp.Pool(len(keys))
pool.map( get_data, keys )
[INFO/PoolWorker-1] child process calling self.run()
[INFO/PoolWorker-2] child process calling self.run()
[INFO/PoolWorker-2] 1270 bytes
[INFO/PoolWorker-1] 1270 bytes
[ERROR/PoolWorker-2] exc: URLError(error(115, 'Operation now in progress'),)
[ERROR/PoolWorker-1] exc: URLError(error(115, 'Operation now in progress'),)
[INFO/PoolWorker-2] 1270 bytes
[INFO/PoolWorker-1] 1270 bytes