Question

我正在尝试从具有约100万个用户ID（followersIDL）的列表中获取配置文件。我正在使用tweepy和相关代码片段：

auth = tweepy.AppAuthHandler(consumerkey,consumersecret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
followersL=[]

for i in range(0, len(followersIDL), 100):
    while True:
        try : 
            followersL.extend(api.lookup_users(user_ids=followersIDL[i:i+100]))
            time.sleep(3)
        except tweepy.TweepError as error :
            print("...Exception : api_code {} len(followersL) = {} : {}".format(                   
                error.__dict__['api_code'],len(followersL),                         
                time.strftime("%a, %d %b %Y %H:%M:%S ", time.localtime())))
            time.sleep(300)
            continue
        break

收集大约390,000个配置文件后，我陷入了循环的异常捕获部分。我尝试延长time.sleep（300）-> time.sleep（3600 * 2），但这仍然无济于事。相关的例外是：

tweepy.error.TweepError: Failed to send request: HTTPSConnectionPool(host='api.twitter.com', port=443): Max retries exceeded with url: /1.1/users/lookup.json (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x1c5976240>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))

这个问题使我感到困惑，因为我认为我{@ 3}在两次请求之间睡了3秒钟，这使我感到困惑。

如果我键盘中断了代码执行（在卡在异常处理部分中了几个小时之后），看来我可以重新运行代码，它将愉快地收集相同的390k用户，然后卡在异常中再次处理。 这毫无意义，当我将异常处理等待时间设置为2小时时，它实际上并没有脱离异常处理部分，但是如果我杀死代码然后重新运行它，它将起作用。

似乎它们没有被IP地址阻止。相反，似乎在“连接”中嵌入了一些东西。如果我尝试重新认证，则会收到与尝试许多重新认证有关的其他错误。我检查了vars()，但没有发现任何问题。

问题：

我如何克服约39万个用户个人资料查找的明显绝对限制？
我如何被阻止？似乎不是按IP地址显示，因为我可以杀死我的代码并重新运行它，并使其成功获得相同的390k配置文件。

为什么我仅限于请求〜390,000个Twitter用户查找？

0 个答案: