Question

我有一张有1万行的表格。我正在尝试使用Python解析它们以更改属性（每行内部）中的小东西，因此我正在使用client.scan（）接受10行的批处理，并将next参数“ LastEvaluatedKey”提供给next。 scan（）。

问题在于，在40行之后，scan（）不会返回lastKey，就像DB一样，它只有40行。

我注意到在另一个表上启动相同的脚本，该表大3倍，停止发生在120行（大3倍）上。

该表具有按需容量。

对此有任何想法吗？

client = boto3.client('dynamodb')
resource = boto3.resource('dynamodb')
table = resource.Table(table_name)

remaining = 3961
iteration = 0
limit = 10

while remaining > 0:
    # retrieve Limit
    if iteration == 0:
        response = client.scan(
            TableName=table_name,
            Limit=limit,
            Select='ALL_ATTRIBUTES',
            ReturnConsumedCapacity='TOTAL',
            TotalSegments=123,
            Segment=122,
        )
        key = response["LastEvaluatedKey"]
    else:

        response = client.scan(
            TableName=table_name,
            Limit=limit,
            Select='ALL_ATTRIBUTES',
            ExclusiveStartKey=key,
            ReturnConsumedCapacity='TOTAL',
            TotalSegments=123,
            Segment=122,
        )

        key = response["LastEvaluatedKey"]

    iteration += 1
    for el in response["Items"]:
        print(el)

Answer 1

我认为有两个问题：

您似乎在扫描时受到限制：尝试删除该内容
您正在运行并行扫描，并始终扫描最后一段：

TotalSegments = 123 Segment = 122

我不确定您的表有多大，但是123个细分非常多，我看不到您在扫描0到121之间的任何其他细分。

尝试一下：

iteration = 0
response = client.scan(
    TableName=table_name,
    Select='ALL_ATTRIBUTES',
    ReturnConsumedCapacity='TOTAL'
)
while True:
    iteration += 1
    for el in response["Items"]:
        print(el)
    last_key = response["LastEvaluatedKey"]
    if not last_key: 
        break
    response = client.scan(
        TableName=table_name,
        Select='ALL_ATTRIBUTES',
        ExclusiveStartKey=last_key,
        ReturnConsumedCapacity='TOTAL'           
    )

我希望以上方法可以检索表中的所有项目。然后，如果您仍想运行并行扫描，则可以执行此操作，但是您必须处理拆分为多个段的过程，为了高效执行此操作，您必须同时运行这些段（执行起来比较复杂）而不是顺序扫描。

DynamoDB Client.Scan（）未返回LastEvaluatedKey参数

1 个答案: