使用boto清空DynamoDB表

时间:2015-02-14 23:58:00

标签: python amazon-dynamodb boto

如何以boto方式最佳地(以财务成本计算)清空DynamoDB表? (正如我们在SQL中用truncate语句所做的那样。)

boto.dynamodb2.table.delete()boto.dynamodb2.layer1.DynamoDBConnection.delete_table()删除整个表格,而boto.dynamodb2.table.delete_item() boto.dynamodb2.table.BatchTable.delete_item()仅删除指定的项目。

3 个答案:

答案 0 :(得分:5)

如同Johnny Wu所述,删除表并重新创建表比删除单个项更有效。您应该确保您的代码在完全删除之前不会尝试创建新表。

def deleteTable(table_name):
    print('deleting table')
    return client.delete_table(TableName=table_name)


def createTable(table_name):
    waiter = client.get_waiter('table_not_exists')
    waiter.wait(TableName=table_name)
    print('creating table')
    table = dynamodb.create_table(
        TableName=table_name,
        KeySchema=[
            {
                'AttributeName': 'YOURATTRIBUTENAME',
                'KeyType': 'HASH'
            }
        ],
        AttributeDefinitions= [
            {
                'AttributeName': 'YOURATTRIBUTENAME',
                'AttributeType': 'S'
            }
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 1,
            'WriteCapacityUnits': 1
        },
        StreamSpecification={
            'StreamEnabled': False
        }
    )


def emptyTable(table_name):
    deleteTable(table_name)
    createTable(table_name)

答案 1 :(得分:5)

虽然我同意Johnny Wu的观点,即删除表并重新创建它的效率要高得多,但是在某些情况下,例如当许多GSI或Tirgger事件与某个表相关联时,您不想重新关联那些。下面的脚本应该可以递归扫描表并使用批处理功能删除表中的所有项目。但是对于大型表,这可能无法正常工作,因为它要求将表中的所有项目都加载到计算机中

import boto3
dynamo = boto3.resource('dynamodb')

def truncateTable(tableName):
    table = dynamo.Table(tableName)
    
    #get the table keys
    tableKeyNames = [key.get("AttributeName") for key in table.key_schema]
    
    """
    NOTE: there are reserved attributes for key names, please see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ReservedWords.html
    if a hash or range key is in the reserved word list, you will need to use the ExpressionAttributeNames parameter
    described at https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Table.scan
    """

    #Only retrieve the keys for each item in the table (minimize data transfer)
    ProjectionExpression = ", ".join(tableKeyNames)
    
    response = table.scan(ProjectionExpression=ProjectionExpression)
    data = response.get('Items')
    
    while 'LastEvaluatedKey' in response:
        response = table.scan(
            ProjectionExpression=ProjectionExpression, 
            ExclusiveStartKey=response['LastEvaluatedKey'])
        data.extend(response['Items'])

    with table.batch_writer() as batch:
        for each in data:
            batch.delete_item(
                Key={key: each[key] for key in tableKeyNames}
            )
            
truncateTable("YOUR_TABLE_NAME")

答案 2 :(得分:2)

删除表比逐个删除项更有效。如果您能够控制截断点,那么您可以执行与time series data文档中建议的旋转表类似的操作。