Python请求停止工作

时间:2017-03-16 21:16:37

标签: python python-2.7 python-requests

我有一个简单的Python(2.7)脚本,如下所示:

from requests import get

game_date = '03/16/2017'
headers = {'Referer': 'http://stats.nba.com/standings/',
           'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

response = get('http://stats.nba.com/stats/scoreboard', 
    params = {'DayOffset': 0, 'LeagueID': '00', 'gameDate': game_date}, 
    headers = headers, 
    timeout = 10)

response.raise_for_status() # raise exception if invalid response

len_resultsets = len(response.json()['resultSets'])

# etc. etc.

这曾经在我的Linux机器上工作(直到2天前),但现在它已经不存在了。如果timeout中没有get选项,它只会坐在那里而永远不会返回。它在我的Mac上仍能完美运行。我什么都没改变。我尝试了不同的用户代理字符串但没有运气。有什么想法吗?

1 个答案:

答案 0 :(得分:1)

我能够通过使用公共代理(通过this project)解决这个问题。并非所有的公共代理都是成功的,但您可以设置一个试错循环,直到有一个。像这样:

from http.requests.proxy.requestProxy import RequestProxy
from requests import get

#list of proxies proxies...
req_proxy = RequestProxy()
proxy_list = req_proxy.get_proxy_list()
#

game_date = '03/16/2017'
results_dict = {}
headers = {'Referer': 'http://stats.nba.com/standings/',
           'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

response = ''
len_resultsets = 0
trying = True

while trying:
    try:
        proxies = {'http': random.choice(proxy_list)}
        response = get('http://stats.nba.com/stats/scoreboard',
                       params = {'DayOffset': 0,
                                 'LeagueID': '00',
                                 'gameDate': game_date}, 
                       headers = headers, 
                       timeout = 30, 
                       proxies = proxies
                      )

        response.raise_for_status() # raise exception if invalid response
        len_resultsets = len(response.json()['resultSets'])

        trying = False

    except:
        time.sleep(5)