Question

我有一个用于数据处理的python代码，我想使用azure block blob作为代码的数据输入，要指定，来自块blob的csv文件。将auture blob中的csv文件下载到本地路径，并在本地运行时上传其他方式的python代码都很好，但问题是我的代码在azure虚拟机上运行，因为它对我的Apple Air来说很重，pandas read_csv在这种情况下，从本地路径不起作用，因此我必须通过流下载并上传和更新csv文件到azure存储，而无需本地保存。下载和上传csv都是非常小的量，远小于blob块限制

没有那么多的教程可以解释如何逐步解决这个问题，MS Docs通常很难解释，我的最小代码如下：

从azure blob存储下载

from azure.storage.blob import BlockBlobService
storage = BlockBlobService(account_name='myname', account_key = 'mykey')
#here i don't know how to make a csv stream that could could be used in next steps#
file = storage.get_blob_to_stream('accountname','blobname','stream')
df = pd.read_csv(file)
#df for later steps#

用于按代码逐行上传和更新数据帧的blob

df #dataframe generated by code 
'i don't know how to do the preparation steps for df and the final fire up operation'
storage.put_blob_to_list _by_stream('accountname','blobname','stream')

可以请你为我做一步一步的教程，因为ppl有天蓝色blob的经验，这应该不是很困难。

或者如果除了使用blob之外你有更好的解决方案，请点击一下。感谢。

Answer 1

所以该文件仍在进行中，我认为它越来越好...... 有用的链接：

要从blob存储中下载文件作为流，您可以使用BytesIO：

from azure.storage.blob import BlockBlobService
from io import BytesIO
from shutil import copyfileobj 
with BytesIO() as input_blob:
    with BytesIO() as output_blob:
        block_blob_service = BlockBlobService(account_name='my_account_name', account_key='my_account_key')
        # Download as a stream
        block_blob_service.get_blob_to_stream('mycontainer', 'myinputfilename', input_blob)

        # Do whatever you want to do - here I am just copying the input stream to the output stream
        copyfileobj(input_blob, output_blob)
        ...

        # Create the a new blob
        block_blob_service.create_blob_from_stream('mycontainer', 'myoutputfilename', output_blob)

        # Or update the same blob
        block_blob_service.create_blob_from_stream('mycontainer', 'myinputfilename', output_blob)

Python脚本按流使用Azure存储Blob中的数据，并按流更新blob，而无需读取和上载本地文件

1 个答案: