Question

我正在尝试从Web获取一个大文件，并将其直接传输到zipfile模块提供的zipfile编写器中，例如：

from urllib.request import urlopen
from zipfile import ZipFile

zip_file = ZipFile("/a/certain/local/zip/file.zip","a")
entry = zip_file.open("an.entry","w")
entry.write( urlopen("http://a.certain.file/on?the=web") )

显然，这不起作用，因为.write接受bytes参数，而不是I / O读取器。但是，由于文件相当大，我不想在压缩之前将整个文件加载到RAM中。

简单的解决方案是使用bash（从未尝试过，可能是错误的）：

curl -s "http://a.certain.file/on?the=web" | zip -q /a/certain/local/zip/file.zip

但是在Python脚本中放置一行bash并不是一件非常优雅，也不方便的事情。

另一种解决方案是使用urllib.request.urlretrieve下载文件，然后将路径传递给zipfile.ZipFile.open，但这样我仍然需要等待下载完成，此外还要消耗更多的磁盘I / O资源。

在Python中，有没有办法直接将下载流传递给zipfile编写器，就像上面的bash管道一样？

Answer 1

您可以使用shutil.copyfileobj()在文件对象之间有效地复制数据：

from shutil import copyfileobj

with ZipFile("/a/certain/local/zip/file.zip", "w") as zip_file:
    with zip_file.open("an.entry", "w") as entry:
        with urlopen("http://a.certain.file/on?the=web") as response:
            shutil.copyfileobj(response, entry)

这将在源文件对象上使用给定的chunksize调用.read()，然后将该块传递给目标文件对象上的.write()方法。

如果您使用的是Python 3.5或更早版本（您无法直接写入ZipFile成员），您唯一的选择是首先流式传输到临时文件：

from shutil import copyfileobj
from tempfile import NamedTemporaryFile

with ZipFile("/a/certain/local/zip/file.zip", "w") as zip_file:
    with NamedTemporaryFile() as cache:
        with urlopen("http://a.certain.file/on?the=web") as response:
            shutil.copyfileobj(response, cache)
            cache.flush()
            zipfile.write('an.entry', cache.name)

使用此类NamedTemporaryFile()仅适用于POSIX系统，在Windows上，您无法再次打开相同的文件名，因此您必须使用tempfile.mkstemp() generated name，打开来自那里的文件，然后使用try...finally进行清理。

将类文件对象传递给另一个类文件对象的write（）方法

1 个答案: