Question

有没有一种标准的方法可以延迟获取下一个数据块并按元素产生数据

当前，我正在获取所有块并将其与itertools链接

def list_blobs(container_name:str, prefix:str):    
    chunks = []
    next_marker=None
    while True:
        blobs = blob_service.list_blobs(container_name, prefix=prefix, num_results=100, marker=next_marker)
        next_marker = blobs.next_marker
        chunks.append(blobs)
        if not next_marker:
            break

    return itertools.chain.from_iterable(chunks)

list_blobs提取程序的“惰性”版本是什么？

Answer 1

您可以只使用yield from：

def list_blobs(container_name:str, prefix:str):
    next_marker = True
    while next_marker:
        blobs = blob_service.list_blobs(container_name, prefix=prefix, num_results=100, marker=next_marker)
        next_marker = blobs.next_marker
        yield from blobs

Answer 2

将chunks.append(blobs)替换为yield from blobs，并完全摆脱return和chunks list：

def generate_blobs(container_name:str, prefix:str):
    next_marker = None
    while True:
        blobs = blob_service.list_blobs(container_name, prefix=prefix, num_results=100, marker=next_marker)
        next_marker = blobs.next_marker
        yield from blobs
        if not next_marker:
            break

将函数转换为生成器函数，一次生成一个项。

Answer 3

@ ShadowRanger，@Kasrâmvd非常感谢您

@timgeb，可以通过Azure BlobStorage进行延迟懒散的完整代码

from azure.storage.blob import BlockBlobService
from azure.storage.blob import Blob
from typing import Iterable, Tuple

def blob_iterator(account:str, account_key:str, bucket:str, prefix:str)-> Iterable[Tuple[str, str]]:
    blob_service = BlockBlobService(account_name=account, account_key=account_key) 

    def list_blobs(bucket:str, prefix:str)->Blob:
        next_marker = None
        while True:
            blobs = blob_service.list_blobs(bucket, prefix=prefix, num_results=100, marker=next_marker)
            yield from blobs
            next_marker = blobs.next_marker
            if not next_marker:
                break

    def get_text(bucket:str, name:str)->str:
        return blob_service.get_blob_to_text(bucket, name).content

    return ( (blob.name, get_text(bucket, blob.name)) for blob in list_blobs(bucket, prefix) )


it = blob_iterator('account', 'account_key', 'container_name', prefix='AA')

Python 3可迭代的懒惰块获取

3 个答案: