我正在尝试从不同的API获取数据。它们以JSON格式接收,存储在SQLite中,然后进行解析。
我遇到的问题是,在发送许多请求时,即使我在请求之间使用time.sleep
,我最终也会收到错误。
我的代码如下所示,其中这将在循环中,并且要打开的url将会发生变化:
base_url = 'https://www.whateversite.com/api/index.php?'
custom_url = 'variable_text1' + & + 'variable_text2'
url = base_url + custom_urls #url will be changing
time.sleep(1)
data = urllib2.urlopen(url).read()
这在循环中运行了数千次。问题出现在脚本运行一段时间后(最多几个小时),然后我得到以下错误或类似错误:
data = urllib2.urlopen(url).read()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1222, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>
或
uh = urllib.urlopen(url)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 87, in urlopen
return opener.open(url)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 208, in open
return getattr(self, name)(url)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 437, in open_https
h.endheaders(data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 829, in _send_output
self.send(msg)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 791, in send
self.connect()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1172, in connect
self.timeout, self.source_address)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not known
我相信这是因为如果在短时间内经常使用,模块会在某些时候出错。
从我在which module is better的许多不同主题中看到的内容,我认为我的需求一切都会奏效,选择一个的主要杠杆是它可以打开尽可能多的网址。根据我的经验,urllib
和urllib2
比requests
更好,因为requests
在更短的时间内崩溃。
假设我不想增加time.sleep
中使用的等待时间,这些是我到目前为止所考虑的解决方案:
我想过要结合所有不同的模块。那将是:
requests
。urllib2
httplib2
或urllib
)或返回requests
使用try .. except
块来处理该异常,如建议here。
我也读过sending multiple requests at once or in parallel。我不知道这究竟是如何起作用的,以及它是否真的有用
但是,我不相信任何这些解决方案。
你能想到更优雅和/或更有效的解决方案来处理这个错误吗?
我使用的是Python 2.7
答案 0 :(得分:0)
即使我不相信,我最终也试图实施try .. except
阻止,我对结果非常满意:
for url in list_of_urls:
time.sleep(2)
try:
response = urllib2.urlopen(url)
data = response.read()
time.sleep(0.1)
response.close() #as suggested by zachyee in the comments
#code to save data in SQLite database
except urllib2.URLError as e:
print '***** urllib2.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known> *****'
#save error in SQLite
cur.execute('''INSERT INTO Errors (error_type, error_ts, url_queried)
VALUES (?, ?, ?)''', ('urllib2.URLError', timestamp, url))
conn.commit()
time.sleep(30) #give it a small break
脚本一直运行到最后。
从数以千计的请求中我得到了8个错误,这些错误已保存在我的数据库及其相关URL中。这样,如果需要,我可以尝试再次检索这些网址。