我在Azure Blob存储中保存了numpy数组,并将它们加载到这样的流中:
stream = io.BytesIO()
store.get_blob_to_stream(container, 'cat.npy', stream)
我从stream.getvalue()
知道该流包含用于重建数组的元数据。这是前150个字节:
b"\x93NUMPY\x01\x00v\x00{'descr': '|u1', 'fortran_order': False, 'shape': (720, 1280, 3), } \n\xc1\xb0\x94\xc2\xb1\x95\xc3\xb2\x96\xc4\xb3\x97\xc5\xb4\x98\xc6\xb5\x99\xc7\xb6\x9a\xc7"
是否可以使用numpy.load
或通过其他一些简单的方法来加载字节流?
我可以将阵列保存到磁盘并从磁盘加载,但是出于某些原因我想避免这种情况……
编辑:仅需强调,输出必须是一个numpy数组,其形状和dtype在流的前128个字节中指定。
答案 0 :(得分:2)
对于np.savez,上述解决方案通常需要工作。
import io
import numpy as np
stream = io.BytesIO()
arr1 = np.random.rand(20,4)
arr2 = np.random.rand(20,4)
np.savez(stream, A=arr1, B=arr2)
block_blob_service.create_blob_from_bytes(container,
"my/path.npz",
stream.getvalue())
from numpy.lib.npyio import NpzFile
stream = io.BytesIO()
block_blob_service.get_blob_to_stream(container, "my/path.npz", stream)
ret = NpzFile(stream, own_fid=True, allow_pickle=True)
print(ret.files)
""" ['A', 'B'] """
print(ret['A'].shape)
""" (20, 4) """
答案 1 :(得分:1)
我试图用几种方法来实现您的需求。
这是我的示例代码。
from azure.storage.blob.baseblobservice import BaseBlobService
import numpy as np
account_name = '<your account name>'
account_key = '<your account key>'
container_name = '<your container name>'
blob_name = '<your blob name>'
blob_service = BaseBlobService(
account_name=account_name,
account_key=account_key
)
示例1.生成带有sas令牌的blob网址,以通过requests
from azure.storage.blob import BlobPermissions
from datetime import datetime, timedelta
import requests
sas_token = blob_service.generate_blob_shared_access_signature(container_name, blob_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1))
print(sas_token)
url_with_sas = blob_service.make_blob_url(container_name, blob_name, sas_token=sas_token)
print(url_with_sas)
r = requests.get(url_with_sas)
dat = np.frombuffer(r.content)
print('from requests', dat)
示例2。通过BytesIO
import io
stream = io.BytesIO()
blob_service.get_blob_to_stream(container_name, blob_name, stream)
dat = np.frombuffer(stream.getbuffer())
print('from BytesIO', dat)
示例3.将numpy.fromfile
与DataSource
结合使用以打开带有sas令牌的blob URL,它实际上会将blob文件下载到本地文件系统中。
ds = np.DataSource()
# ds = np.DataSource(None) # use with temporary file
# ds = np.DataSource(path) # use with path like `data/`
f = ds.open(url_with_sas)
dat = np.fromfile(f)
print('from DataSource', dat)
我认为示例1和2更适合您。
答案 2 :(得分:0)
这是我想出的一个小技巧,它基本上只是从前128个字节中获取元数据:
def load_npy_from_stream(stream_):
"""Experimental, may not work!
:param stream_: io.BytesIO() object obtained by e.g. calling BlockBlobService().get_blob_to_stream() containing
the binary stream of a standard format .npy file.
:return: numpy.ndarray
"""
stream_.seek(0)
prefix_ = stream_.read(128) # first 128 bytes seem to be the metadata
dict_string = re.search('\{(.*?)\}', prefix_[1:].decode())[0]
metadata_dict = eval(dict_string)
array = np.frombuffer(stream_.read(), dtype=metadata_dict['descr']).reshape(metadata_dict['shape'])
return array
可能会以多种方式失败,但是如果有人想试一试,我将其发布在这里。我将对此进行测试,并会以我所了解的更多信息返回。