使用Python删除DynamoDB的所有项目

时间:2019-03-14 18:46:23

标签: python amazon-dynamodb boto3

如何使用python(boto3)从DynamoDB中删除所有项目?

我正在尝试这样做:

[1] " 1"                 " eu Boqueirão que "

但是给我这个错误:

scan = table.scan()
with table.batch_writer() as batch:
  for each in scan['Items']:
    batch.delete_item(Key=each)

5 个答案:

答案 0 :(得分:4)

虽然我同意删除表并重新创建它的效率要高得多,但是在某些情况下,例如当许多GSI或Trigger事件与一个表相关联时,您不必重新关联这些表。下面的脚本将遍历扫描以处理大表(每个扫描调用将返回价值1Mb的键),并使用批处理功能删除表中的所有项目。

import boto3
dynamo = boto3.resource('dynamodb')

def truncateTable(tableName):
    table = dynamo.Table(tableName)
    
    #get the table keys
    tableKeyNames = [key.get("AttributeName") for key in table.key_schema]

    #Only retrieve the keys for each item in the table (minimize data transfer)
    projectionExpression = ", ".join('#' + key for key in tableKeyNames)
    expressionAttrNames = {'#'+key: key for key in tableKeyNames}
    
    counter = 0
    page = table.scan(ProjectionExpression=projectionExpression, ExpressionAttributeNames=expressionAttrNames)
    with table.batch_writer() as batch:
        while page["Count"] > 0:
            counter += page["Count"]
            # Delete items in batches
            for itemKeys in page["Items"]:
                batch.delete_item(Key=itemKeys)
            # Fetch the next page
            if 'LastEvaluatedKey' in page:
                page = table.scan(
                    ProjectionExpression=projectionExpression, ExpressionAttributeNames=expressionAttrNames,
                    ExclusiveStartKey=page['LastEvaluatedKey'])
            else:
                break
    print(f"Deleted {counter}")
            
truncateTable("YOUR_TABLE_NAME")

答案 1 :(得分:1)

使用BatchWriteItemdocumentation状态

  

BatchWriteItem操作在一个或多个表中放置或删除多个项目。一次调用BatchWriteItem最多可以写入16 MB的数据,其中可以包含多达25个放置或删除请求。要写入的单个项目最大可以为400 KB。

我假设Boto3 API也具有此功能,但是名称可能不同。

答案 2 :(得分:0)

我找到了解决方案!我只是用表ID装入密钥并搜索ID(compId),就可以了:)

scan = table.scan()
with table.batch_writer() as batch:
    for each in scan['Items']:
        batch.delete_item(
            Key={
                'uId': each['uId'],
                'compId': each['compId']
            }
        )

答案 3 :(得分:0)

这是一个考虑到以下事实的答案:如果您要截断一个大表(或一个包含大项目的小表),则可能不会在第一次调用中就获得所有记录。假设您仅使用HashKey(称为id),因此,如果表上也有SortKey,则必须在ProjectionExpressiondelete_item调用中添加一些内容。

其中有些多余的东西可以修剪,只是打印一个计数器以阻止标准输出,以使我们人类开心。

import boto3

table = boto3.resource('dynamodb').Table('my-table-name')
scan = None

with table.batch_writer() as batch:
    count = 0
    while scan is None or 'LastEvaluatedKey' in scan:
        if scan is not None and 'LastEvaluatedKey' in scan:
            scan = table.scan(
                ProjectionExpression='id',
                ExclusiveStartKey=scan['LastEvaluatedKey'],
            )
        else:
            scan = table.scan(ProjectionExpression='id')

        for item in scan['Items']:
            if count % 5000 == 0:
                print(count)
            batch.delete_item(Key={'id': item['id']})
            count = count + 1

答案 4 :(得分:0)

使用 batch_writer() 的相同方法,但多线程

import boto3
import threading
import time
from queue import LifoQueue, Empty


class DDBTableCleaner(object):

    def __init__(self, table_name, threads_limit=32):
        self._queue = LifoQueue()
        self._threads = dict()
        self._cnt = 0
        self._done = False
        self._threads_limit = threads_limit
        dynamodb_client = boto3.resource('dynamodb')
        self.table = dynamodb_client.Table(table_name)

    def run(self):
        for i in range(self._threads_limit):
            thread_name = f'worker_thread_{i}'
            self._threads[thread_name] = threading.Thread(
                target=self.worker_thread,
                name=thread_name,
            )
            self._threads[thread_name].start()
        self.queue_replenish()
        while self._queue.qsize() > 0:
            print(f'items processed: ({self._cnt})')
            time.sleep(1)
        self._done = True
        for thread in self._threads.values():
            if thread.is_alive():
                thread.join()
        print(f'items processed: ({self._cnt})')

    def queue_replenish(self):
        table_key_names = [key.get('AttributeName') for key in self.table.key_schema]
        projection_expression = ', '.join('#' + key for key in table_key_names)
        expression_attr_names = {'#' + key: key for key in table_key_names}
        page = self.table.scan(
            ProjectionExpression=projection_expression,
            ExpressionAttributeNames=expression_attr_names
        )
        while page['Count'] > 0:
            for item in page['Items']:
                self._queue.put(item)
            if 'LastEvaluatedKey' in page:
                page = self.table.scan(
                    ProjectionExpression=projection_expression,
                    ExpressionAttributeNames=expression_attr_names,
                    ExclusiveStartKey=page['LastEvaluatedKey']
                )
            else:
                break

    def worker_thread(self):
        thr_name = threading.current_thread().name
        print(f'[{thr_name}] thread started')
        with self.table.batch_writer() as batch:
            while not self._done:
                try:
                    item = self._queue.get_nowait()
                except Empty:
                    time.sleep(1)
                else:
                    try:
                        batch.delete_item(Key=item)
                        self._cnt += 1
                    except Exception as e:
                        print(e)
        print(f'[{thr_name}] thread completed')


if __name__ == '__main__':

    table = '...'
    cleaner = DDBTableCleaner(table, threads_limit=10)
    cleaner.run()