目前我有一个脚本可以从Reddit的首页下载顶级标题,它几乎总是有效。偶尔我会收到以下例外情况。我知道我应该插入try
和except
语句来保护我的代码,但是我应该把它放在哪里?
抓取:
def crawlReddit():
r = praw.Reddit(user_agent='challenge') # PRAW object
topHeadlines = [] # List of headlines
for item in r.get_front_page():
topHeadlines.append(item) # Add headlines to list
return topHeadlines[0].title # Return top headline
def main():
headline = crawlReddit() # Pull top headline
if __name__ == "__main__":
main()
错误:
Traceback (most recent call last):
File "makecall.py", line 57, in <module>
main() # Run
File "makecall.py", line 53, in main
headline = crawlReddit() # Pull top headline
File "makecall.py", line 34, in crawlReddit
for item in r.get_front_page():
File "/Users/myusername/Documents/dir/lib/python2.7/site-packages/praw/__init__.py", line 480, in get_content
page_data = self.request_json(url, params=params)
File "/Users/myusername/Documents/dir/lib/python2.7/site-packages/praw/decorators.py", line 161, in wrapped
return_value = function(reddit_session, *args, **kwargs)
File "/Users/myusername/Documents/dir/lib/python2.7/site-packages/praw/__init__.py", line 519, in request_json
response = self._request(url, params, data)
File "/Users/myusername/Documents/dir/lib/python2.7/site-packages/praw/__init__.py", line 383, in _request
_raise_response_exceptions(response)
File "/Users/myusername/Documents/dir/lib/python2.7/site-packages/praw/internal.py", line 172, in _raise_response_exceptions
response.raise_for_status()
File "/Users/myusername/Documents/dir/lib/python2.7/site-packages/requests/models.py", line 831, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable
答案 0 :(得分:1)
看起来r.get_front_page()
返回一个延迟评估的对象,你只需要该对象的第一个元素。如果是这样,请尝试以下操作:
import time
def crawlReddit():
r = praw.Reddit(user_agent='challenge') # PRAW object
front_page = r.get_front_page()
try:
first_headline = front_page.next() # Get the first item from front_page
except HTTPError:
return None
else:
return first_headline.title
def main():
max_attempts = 3
attempts = 1
headline = crawlReddit()
while not headline and attempts < max_attempts:
time.sleep(1) # Make the program wait a bit before resending request
headline = crawlReddit()
attempts += 1
if not headline:
print "Request failed after {} attempts".format(max_attempts)
if __name__ == "__main__":
main()
编辑代码现在尝试最多访问数据3次,失败尝试之间间隔一秒。在第三次尝试后它放弃了。服务器可能处于脱机状态等。