我有一个简单的Python(2.7)脚本,如下所示:
from requests import get
game_date = '03/16/2017'
headers = {'Referer': 'http://stats.nba.com/standings/',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
response = get('http://stats.nba.com/stats/scoreboard',
params = {'DayOffset': 0, 'LeagueID': '00', 'gameDate': game_date},
headers = headers,
timeout = 10)
response.raise_for_status() # raise exception if invalid response
len_resultsets = len(response.json()['resultSets'])
# etc. etc.
这曾经在我的Linux机器上工作(直到2天前),但现在它已经不存在了。如果timeout
中没有get
选项,它只会坐在那里而永远不会返回。它在我的Mac上仍能完美运行。我什么都没改变。我尝试了不同的用户代理字符串但没有运气。有什么想法吗?
答案 0 :(得分:1)
我能够通过使用公共代理(通过this project)解决这个问题。并非所有的公共代理都是成功的,但您可以设置一个试错循环,直到有一个。像这样:
from http.requests.proxy.requestProxy import RequestProxy
from requests import get
#list of proxies proxies...
req_proxy = RequestProxy()
proxy_list = req_proxy.get_proxy_list()
#
game_date = '03/16/2017'
results_dict = {}
headers = {'Referer': 'http://stats.nba.com/standings/',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
response = ''
len_resultsets = 0
trying = True
while trying:
try:
proxies = {'http': random.choice(proxy_list)}
response = get('http://stats.nba.com/stats/scoreboard',
params = {'DayOffset': 0,
'LeagueID': '00',
'gameDate': game_date},
headers = headers,
timeout = 30,
proxies = proxies
)
response.raise_for_status() # raise exception if invalid response
len_resultsets = len(response.json()['resultSets'])
trying = False
except:
time.sleep(5)