我正在尝试使用以下代码下载zip文件:
o = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
#login
p = urllib.urlencode( { usernameField: usernameVal, passField: passVal } )
f = o.open(authUrl, p )
data = f.read()
print data
f.close()
#download file
f = o.open(remoteFileUrl)
localFile = open(localFile, "wb")
localFile.write(f.read())
f.close()
我收到了一些二进制数据,但我“下载”的文件大小太小而且不是有效的zip文件。我没有正确检索zip文件吗? f = o.open(remoteFileUrl)
的HTTP响应标头如下所示。我不知道是否需要特殊处理来处理这种反应:
HTTP / 1.1 200 OK服务器:
Apache-Coyote / 1.1 Pragma:私人
缓存控制:必须重新验证
到期日:1997年12月31日星期二23:59:59 GMT
内容 - 处置:内联;
文件名= “files.zip”;
内容类型:申请/邮编
Transfer-Encoding:chunked
答案 0 :(得分:10)
f.read()
不一定读取整个文件,而只读取它的一个数据包(如果它很小,可能是整个文件,但不适用于大文件)。
你需要像这样循环数据包:
while 1:
packet = f.read()
if not packet:
break
localFile.write(packet)
f.close()
f.read()
返回一个空数据包,表示您已阅读整个文件。
答案 1 :(得分:1)
如果你不介意将整个zip文件读入内存,最快的读写方式如下:
data = f.readlines()
with open(localFile,'wb') as output:
output.writelines(data)
否则,当您通过网络获取时,要以块的形式进行读写,请执行
with open(localFile, "wb") as output:
chunk = f.read()
while chunk:
output.write(chunk)
chunk = f.read()
这有点不太整洁,但避免将整个文件保存在内存中。希望它有所帮助。
答案 2 :(得分:1)
这是一个更强大的解决方案,使用urllib2以块的形式下载文件并打印下载状态
import os
import urllib2
import math
def downloadChunks(url):
"""Helper to download large files
the only arg is a url
this file will go to a temp directory
the file will also be downloaded
in chunks and print out how much remains
"""
baseFile = os.path.basename(url)
#move the file to a more uniq path
os.umask(0002)
temp_path = "/tmp/"
try:
file = os.path.join(temp_path,baseFile)
req = urllib2.urlopen(url)
total_size = int(req.info().getheader('Content-Length').strip())
downloaded = 0
CHUNK = 256 * 10240
with open(file, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
downloaded += len(chunk)
print math.floor( (downloaded / total_size) * 100 )
if not chunk: break
fp.write(chunk)
except urllib2.HTTPError, e:
print "HTTP Error:",e.code , url
return False
except urllib2.URLError, e:
print "URL Error:",e.reason , url
return False
return file
答案 3 :(得分:0)
试试这个:
#download file
f = o.open(remoteFileUrl)
response = ""
while 1:
data = f.read()
if not data:
break
response += data
with open(localFile, "wb") as local_file:
local_file.write(response)