Question

我需要将文件从Google云存储转移到azure blob存储。

Google提供了一个代码片段，用于将文件下载到字节变量，如下所示：

# Get Payload Data
req = client.objects().get_media(
        bucket=bucket_name,
        object=object_name,
        generation=generation)    # optional
# The BytesIO object may be replaced with any io.Base instance.
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, req, chunksize=1024*1024)
done = False
while not done:
    status, done = downloader.next_chunk()
    if status:
        print 'Download %d%%.' % int(status.progress() * 100)
    print 'Download Complete!'
print fh.getvalue()

我可以通过更改fh对象类型来修改它以存储到文件：

fh = open(object_name, 'wb')

然后我可以使用blob_service.put_block_blob_from_path上传到azure blob存储空间。

我想避免在进行传输的机器上写入本地文件。

我收集Google的代码片段，一次将数据加载到io.BytesIO（）对象中。我估计我应该使用它来一次写入blob存储块。

我尝试将整个内容读入内存，然后使用put_block_blob_from_bytes上传，但我收到内存错误（文件可能太大（~600MB）。

有什么建议吗？

Answer 1

根据blobservice.py for Azure Storage和BlobReader for Google Cloud Storage的源代码，您可以尝试使用Azure函数blobservice.put_block_blob_from_file来编写来自GCS类的流blobreader具有该功能read作为信息流，请参见下文。

因此，请参阅https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_BlobReader中的代码，您可以尝试按照以下方式执行此操作。

from google.appengine.ext import blobstore
from azure.storage.blob import BlobService

blob_key = ...
blob_reader = blobstore.BlobReader(blob_key)

blob_service = BlobService(account_name, account_key)
container_name = ...
blob_name = ...
blobservice.put_block_blob_from_file(container_name, blob_name, blob_reader)

Answer 2

查看SDK源代码后，可以使用以下内容：

from azure.storage.blob import _chunking
from azure.storage.blob import BlobService

# See _BlobChunkUploader
class PartialChunkUploader(_chunking._BlockBlobChunkUploader):
    def __init__(self, blob_service, container_name, blob_name, progress_callback = None):
        super(PartialChunkUploader, self).__init__(blob_service, container_name, blob_name, -1, -1, None, False, 5, 1.0, progress_callback, None)

    def process_chunk(self, chunk_offset, chunk_data):
        '''chunk_offset is the integer offset. chunk_data is an array of bytes.'''
        return self._upload_chunk_with_retries(chunk_offset, chunk_data)

blob_service = BlobService(account_name='myaccount', account_key='mykey')

uploader = PartialChunkUploader(blob_service, "container", "foo")
# while (...):
#     uploader.process_chunk(...)

如何使用python将文件传输到块中的azure blob存储而无需写入文件

2 个答案: