Question

我从这个网站下载了一个tar.gz文件：

http://www.vision.caltech.edu/Image_Datasets/Caltech101/

它应该包含许多图像。理想情况下，我想把所有的图像都读成一个巨大的np.array 原始尺寸。

以下是我的一次尝试：

import tarfile
import numpy as np 


images = []

with tarfile.open(file, "r:gz") as tar:

    for member in tar.getmembers()[:10]:
         if  member.isfile():
              file=tar.extractfile(member)
              image.append(file.read())

现在file.read（）返回类'bytes';不知道怎么读到这个 numpy array。

我试过

np.array(file.read())  # ValueError: embedded null byte
np.fromfile(file)   # AttributeError: '_FileInFile' object has no attribute 'fileno'

Answer 1

您可以尝试NP.fromstring：

NP.fromstring(file.read(), dtype=NP.uint8)

如果希望将字节编码为8位无符号整数。如果你想要别的东西，你可以改变dtype。

编辑：我将32位更改为8位。

似乎无法正确地将tar.gz文件读入Python

1 个答案: