我有一个用于数据处理的python代码,我想使用azure block blob作为代码的数据输入,要指定,来自块blob的csv文件。将auture blob中的csv文件下载到本地路径,并在本地运行时上传其他方式的python代码都很好,但问题是我的代码在azure虚拟机上运行,因为它对我的Apple Air来说很重,pandas read_csv在这种情况下,从本地路径不起作用,因此我必须通过流下载并上传和更新csv文件到azure存储,而无需本地保存。下载和上传csv都是非常小的量,远小于blob块限制
没有那么多的教程可以解释如何逐步解决这个问题,MS Docs通常很难解释,我的最小代码如下:
从azure blob存储下载
from azure.storage.blob import BlockBlobService
storage = BlockBlobService(account_name='myname', account_key = 'mykey')
#here i don't know how to make a csv stream that could could be used in next steps#
file = storage.get_blob_to_stream('accountname','blobname','stream')
df = pd.read_csv(file)
#df for later steps#
用于按代码逐行上传和更新数据帧的blob
df #dataframe generated by code
'i don't know how to do the preparation steps for df and the final fire up operation'
storage.put_blob_to_list _by_stream('accountname','blobname','stream')
可以请你为我做一步一步的教程,因为ppl有天蓝色blob的经验,这应该不是很困难。
或者如果除了使用blob之外你有更好的解决方案,请点击一下。感谢。
答案 0 :(得分:1)
所以该文件仍在进行中,我认为它越来越好...... 有用的链接:
要从blob存储中下载文件作为流,您可以使用BytesIO
:
from azure.storage.blob import BlockBlobService
from io import BytesIO
from shutil import copyfileobj
with BytesIO() as input_blob:
with BytesIO() as output_blob:
block_blob_service = BlockBlobService(account_name='my_account_name', account_key='my_account_key')
# Download as a stream
block_blob_service.get_blob_to_stream('mycontainer', 'myinputfilename', input_blob)
# Do whatever you want to do - here I am just copying the input stream to the output stream
copyfileobj(input_blob, output_blob)
...
# Create the a new blob
block_blob_service.create_blob_from_stream('mycontainer', 'myoutputfilename', output_blob)
# Or update the same blob
block_blob_service.create_blob_from_stream('mycontainer', 'myinputfilename', output_blob)