使用urlgrabber时,推荐使用Content-Encoding: gzip
文件的方法是什么?
现在我正在这样修补它:
g = URLGrabber(http_headers=(("Accept-Encoding", "gzip"),))
g.is_compressed = False # I don't know yet if the server will send me compressed data
# Backup current method of handling downloaded headers
try:
PyCurlFileObject.orig_hdr_retrieve
except AttributeError:
PyCurlFileObject.orig_hdr_retrieve = PyCurlFileObject._hdr_retrieve
def hdr_retrieve(instance, buf):
r = PyCurlFileObject.orig_hdr_retrieve(instance, buf)
if "content-encoding" in buf.lower() and "zip" in buf.lower():
g.is_compressed = True
return r
PyCurlFileObject._hdr_retrieve = hdr_retrieve
g.urlgrab(url, dest)
if g.is_compressed:
# ungzip file here
但它看起来不是很干净,我担心它也不是线程安全的......
答案 0 :(得分:0)
我想我找到了一个线程安全解决方案:
g = URLGrabber((http_headers=(("Accept-Encoding", "gzip"),)))
g.opts._set_attributes(grabber=g)
try:
PyCurlFileObject.orig_setopts
except AttributeError:
PyCurlFileObject.orig_setopts = PyCurlFileObject._set_opts
def setopts(instance, opts={}):
PyCurlFileObject.orig_setopts(instance, opts)
grabber = instance.opts.grabber
grabber.is_compressed = False
def hdr_retrieve(buf):
r = PyCurlFileObject._hdr_retrieve(instance, buf)
if "content-encoding" in buf.lower() and "zip" in buf.lower():
grabber.is_compressed = True
return r
instance.curl_obj.setopt(pycurl.HEADERFUNCTION, hdr_retrieve)
PyCurlFileObject._set_opts = setopts
但它仍然感觉不太“干净”:)