我正在使用Python Twitter tools下载大量用户的最新200条推文。我得到一个只是间歇性地发生的gzip错误。在看似随机的间隔,循环将崩溃与下面的错误堆栈。如果我立即重新启动循环并发送相同的用户,我下载它时一定不会有问题。当它崩溃时,我已经查看了推文的标题,并且似乎与不会导致问题的标题有任何不同。而且我已经证实,没有问题我得到的大量结果也是gzip压缩并且没有压缩。
之前有没有人见过这个问题和/或可以建议修复/解决方法?
这是错误堆栈,它的价值是什么:
File "/Users/martinlbarron/Dropbox/Learning Python/downloadTimeline.py", line 33, in <module>
result=utility.downloadTimeline(kwargs,t)
File "/Users/martinlbarron/Dropbox/Learning Python/utility.py", line 73, in downloadTimeline
response=t.statuses.user_timeline(**kargs)
File "/Library/Python/2.7/site-packages/twitter-1.9.0-py2.7.egg/twitter/api.py", line 173, in __call__
return self._handle_response(req, uri, arg_data)
File "/Library/Python/2.7/site-packages/twitter-1.9.0-py2.7.egg/twitter/api.py", line 184, in _handle_response
data = f.read()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 245, in read
self._read(readsize)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 299, in _read
self._read_eof()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 338, in _read_eof
hex(self.crc)))
IOError: CRC check failed 0xf4196259 != 0x34967f68L
logout
添加我的代码(温柔,我是一个Python新手)
我有一个推特名称列表。我在下面的代码中循环它们,调用我的twitter下载功能(downloadTimeline)。
t = Twitter(
auth=OAuth("XXX", "XXX",
"XXX", "XXX"))
for i in range(startRange,endRange):
#Get the id string for user
row=newlist[i]
sc=row[3]
kwargs = dict(count=200, include_rts=False, include_entities=False, trim_user=True, screen_name=sc)
result=utility.downloadTimeline(kwargs,t)
在downloadTimeline中,我得到了Twitter响应(响应),然后将其解析为字典
def downloadTimeline(kargs, t):
#Get timeline
mylist = list()
counter=1000
try:
response=t.statuses.user_timeline(**kargs)
counter=response.rate_limit_remaining
#parse the file out
if len(response)>0:
for tweet in response:
user=tweet['user']
dict = {
'id_str': cleanLines(tweet['id_str']),
#ommitting the whole list of all the variables I save
}
mylist.append(dict)
except twitter.TwitterError as e:
print("Fail: %i" % e.e.code)
return (mylist, counter)
最后,虽然这显然不是我的代码,但在Python Twitter工具框架中,这是一些似乎令人窒息的代码(特别是在f = gzip.GzipFile(fileobj = buf))
def _handle_response(self, req, uri, arg_data):
try:
handle = urllib_request.urlopen(req)
if handle.headers['Content-Type'] in ['image/jpeg', 'image/png']:
return handle
elif handle.info().get('Content-Encoding') == 'gzip':
# Handle gzip decompression
buf = StringIO(handle.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
else:
data = handle.read()
if "json" == self.format:
res = json.loads(data.decode('utf8'))
return wrap_response(res, handle.headers)
else:
return wrap_response(
data.decode('utf8'), handle.headers)
except urllib_error.HTTPError as e:
if (e.code == 304):
return []
else:
raise TwitterHTTPError(e, uri, self.format, arg_data)
事实证明,很容易在Python Twitter工具中关闭接受gzip标头。但是当我这样做时,我得到以下错误。我想知道响应是否会以某种方式被截断:
File "/Users/martinlbarron/Dropbox/Learning Python/downloadTimeline.py", line 33, in <module>
result=utility.downloadTimeline(kwargs,t)
File "/Users/martinlbarron/Dropbox/Learning Python/utility.py", line 73, in downloadTimeline
response=t.statuses.user_timeline(**kargs)
File "/Library/Python/2.7/site-packages/twitter-1.9.0-py2.7.egg/twitter/api.py", line 175, in __call__
return self._handle_response(req, uri, arg_data)
File "/Library/Python/2.7/site-packages/twitter-1.9.0-py2.7.egg/twitter/api.py", line 193, in _handle_response
res = json.loads(handle.read().decode('utf8'))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 13699 (char 13699)
logout
答案 0 :(得分:2)
而不是:
buf = StringIO(handle.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
试试这个:
decomp = zlib.decompressobj(16+zlib.MAX_WBITS)
data = decomp.decompress(handle.read())
不要忘记import zlib