Question

我正在从REST API下载tarfile，将其写入本地文件，然后从本地提取内容。这是我的代码：

with open ('output.tar.gz', 'wb') as f:
    f.write(o._retrieve_data_stream(p).read())
with open ('output.tar.gz', 'rb') as f:
    t = tarfile.open(fileobj=f)
    t.extractall()

o._retrieve_data_stream(p)检索文件的数据流。

此代码可以正常工作，但对我来说似乎不必要地复杂。我认为我应该能够将字节流直接读入tarfile读取的fileobject中。像这样：

with open(o._retrieve_data_stream(p).read(), 'rb') as f:
    t = tarfile.open(fileobj=f)
    t.extractall()

我意识到我的语法可能有点不稳定，但是我认为它传达了我的意图。

但是当我这样做时，我得到一个编码错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

这是怎么回事？

Answer 1

发帖是因为我在写这篇文章时解决了它。原来我需要使用BytesIO对象。

此代码可以正常工作：

from io import BytesIO

t = tarfile.open(fileobj=BytesIO(o._retrieve_data_stream(p).read()))
t.extractall()

Answer 2

Canadian_Marine 的回答非常接近我的需要，但对于我的特定情况来说完全还不够。在他们的回答中看到 open 命令中的 BytesIO 对象帮助我解决了我的问题。

我发现有必要将请求部分从 tarfile.open 中分离出来，然后将响应内容包装在 tarfile.open 命令内的 BytesIO 对象中。这是我的代码：

from io import BytesIO
import requests
import tarfile

remote_file=requests.get ('https://download.site.com/files/file.tar.gz')

#Extract tarball contents to memory
tar=tarfile.open(fileobj=BytesIO(remote_file.content))
#Optionally print all folders / files within the tarball
print(tar.getnames())
tar.extractall('/home/users/Documents/target_directory/')

这消除了我在使用其他方法时遇到的 ValueError: embedded null byte 和 expected str, bytes or os.PathLike object, not _io.BytesIO 错误。

为什么我不能从网络数据流创建文件对象

2 个答案: