Question

我是python的新手，我写了一些代码来从Web API下载数据。但是，在使用API时我必须遵守一些限制：

每个API密钥每秒1个请求
如果发生超时，请在再次尝试前等待30秒
每个API密钥每天限制100k请求

向Web API发出请求的方法的代码是：

def getMatchDetails(self,match_id):
    '''Calls the WEB Api and requests the data for the match with
    a specific id (in match_id). Then returns the data already decoded 
    from json.'''
    import urllib2
    import json
    import time
    url = self.__makeUrl__(api_key= self.api_key, parameters = ['match_id='+str(match_id)])
    # Sometimes a time out occurs, we keep trying
    while True:
        try:
            start = time.time()
            json_obj = urllib2.urlopen(url)
            end = time.time()
            if end - start < 1:
                time.sleep(1 - (end - start))
        except:
            print('Timed Out, Trying again in 30 seconds')
            time.sleep(30)
            continue
        else:
            break
    detailed_data = json.load(json_obj)
    return detailed_data

方法 makeUrl 简单地连接一些字符串并返回它们。并且为了在每次调用上述方法时更改API密钥，我使用：

def getMatchDetailsForMap(self,match_id):
    self.counter += 1
    self.api_key = self.api_keys[self.counter%len(self.api_keys)]
    return self.getMatchDetails(match_id)

其中self.api_keys是包含所有API密钥的列表。然后我在下面的代码中使用方法 getMatchDetailsForMap 和map函数：

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(14)
ids_to_get = self.__idsToGetChunks__(14)
for chunk in ids_to_get:
        results = pool.map(self.getMatchDetailsForMap,chunk)

方法 idsToGetChunks 返回带有参数（match_id）的lits of lists（块），这些参数将被提供给getMatchDetailsForMap方法。

问题：

尝试使用代码，我意识到每个密钥的1秒限制没有保留;那是为什么？
当发生超时时，它确实减慢了获取数据的过程;使用地图时是否有更好的方法来处理这种异常？（请提示）

感谢阅读和帮助！对不起，很长的帖子。

Answer 1

为了符合这三个要求，我建议编写一个简单的for循环，每个循环执行一个请求。一般来说，等一秒钟。如果发生超时，请等待30秒。不要循环超过100k次。（我假设这个脚本每天运行一次，并且需要不到24小时;））

主程序会为每个API密钥激活一个Process。

简单！

源

# 1 request per second per API key
# If a timeout occurs, wait 30 seconds before trying again
# Limit of 100k requests per day per API key

import logging, time, urllib2
import multiprocessing as mp

def do_fetch(key, timeout):
    return urllib2.urlopen(
        'http://example.com', timeout=timeout
    ).read()

def get_data(api_key):
    logger = mp.get_logger()
    data = None
    # Limit of 100k requests per day per API key
    for num in range(100*1000): 
        t = 1 if num!=1 else 0 # test timeout exception
        try:
            data = do_fetch(api_key, timeout=t)
            logger.info('%d bytes', len(data))
        except urllib2.URLError as exc:
            logger.error('exc: %s', repr(exc))
            # If a timeout occurs, wait 30 seconds before trying again
            time.sleep(3)
        else:
            # "1 request per second per API key"
            time.sleep(1)


mp.log_to_stderr(level=logging.INFO)
keys = [123, 234]
pool = mp.Pool(len(keys))
pool.map( get_data, keys )

输出

[INFO/PoolWorker-1] child process calling self.run()
[INFO/PoolWorker-2] child process calling self.run()
[INFO/PoolWorker-2] 1270 bytes
[INFO/PoolWorker-1] 1270 bytes
[ERROR/PoolWorker-2] exc: URLError(error(115, 'Operation now in progress'),)
[ERROR/PoolWorker-1] exc: URLError(error(115, 'Operation now in progress'),)
[INFO/PoolWorker-2] 1270 bytes
[INFO/PoolWorker-1] 1270 bytes

在Urllib2 + pool.map中处理超时异常和time.sleep

1 个答案:

源

输出