如何使用python(boto3)从DynamoDB中删除所有项目?
我正在尝试这样做:
[1] " 1" " eu Boqueirão que "
但是给我这个错误:
scan = table.scan()
with table.batch_writer() as batch:
for each in scan['Items']:
batch.delete_item(Key=each)
答案 0 :(得分:4)
虽然我同意删除表并重新创建它的效率要高得多,但是在某些情况下,例如当许多GSI或Trigger事件与一个表相关联时,您不必重新关联这些表。下面的脚本将遍历扫描以处理大表(每个扫描调用将返回价值1Mb的键),并使用批处理功能删除表中的所有项目。
import boto3
dynamo = boto3.resource('dynamodb')
def truncateTable(tableName):
table = dynamo.Table(tableName)
#get the table keys
tableKeyNames = [key.get("AttributeName") for key in table.key_schema]
#Only retrieve the keys for each item in the table (minimize data transfer)
projectionExpression = ", ".join('#' + key for key in tableKeyNames)
expressionAttrNames = {'#'+key: key for key in tableKeyNames}
counter = 0
page = table.scan(ProjectionExpression=projectionExpression, ExpressionAttributeNames=expressionAttrNames)
with table.batch_writer() as batch:
while page["Count"] > 0:
counter += page["Count"]
# Delete items in batches
for itemKeys in page["Items"]:
batch.delete_item(Key=itemKeys)
# Fetch the next page
if 'LastEvaluatedKey' in page:
page = table.scan(
ProjectionExpression=projectionExpression, ExpressionAttributeNames=expressionAttrNames,
ExclusiveStartKey=page['LastEvaluatedKey'])
else:
break
print(f"Deleted {counter}")
truncateTable("YOUR_TABLE_NAME")
答案 1 :(得分:1)
使用BatchWriteItem
。 documentation状态
BatchWriteItem操作在一个或多个表中放置或删除多个项目。一次调用BatchWriteItem最多可以写入16 MB的数据,其中可以包含多达25个放置或删除请求。要写入的单个项目最大可以为400 KB。
我假设Boto3 API也具有此功能,但是名称可能不同。
答案 2 :(得分:0)
我找到了解决方案!我只是用表ID装入密钥并搜索ID(compId),就可以了:)
scan = table.scan()
with table.batch_writer() as batch:
for each in scan['Items']:
batch.delete_item(
Key={
'uId': each['uId'],
'compId': each['compId']
}
)
答案 3 :(得分:0)
这是一个考虑到以下事实的答案:如果您要截断一个大表(或一个包含大项目的小表),则可能不会在第一次调用中就获得所有记录。假设您仅使用HashKey(称为id
),因此,如果表上也有SortKey,则必须在ProjectionExpression
和delete_item
调用中添加一些内容。
其中有些多余的东西可以修剪,只是打印一个计数器以阻止标准输出,以使我们人类开心。
import boto3
table = boto3.resource('dynamodb').Table('my-table-name')
scan = None
with table.batch_writer() as batch:
count = 0
while scan is None or 'LastEvaluatedKey' in scan:
if scan is not None and 'LastEvaluatedKey' in scan:
scan = table.scan(
ProjectionExpression='id',
ExclusiveStartKey=scan['LastEvaluatedKey'],
)
else:
scan = table.scan(ProjectionExpression='id')
for item in scan['Items']:
if count % 5000 == 0:
print(count)
batch.delete_item(Key={'id': item['id']})
count = count + 1
答案 4 :(得分:0)
使用 batch_writer()
的相同方法,但多线程
import boto3
import threading
import time
from queue import LifoQueue, Empty
class DDBTableCleaner(object):
def __init__(self, table_name, threads_limit=32):
self._queue = LifoQueue()
self._threads = dict()
self._cnt = 0
self._done = False
self._threads_limit = threads_limit
dynamodb_client = boto3.resource('dynamodb')
self.table = dynamodb_client.Table(table_name)
def run(self):
for i in range(self._threads_limit):
thread_name = f'worker_thread_{i}'
self._threads[thread_name] = threading.Thread(
target=self.worker_thread,
name=thread_name,
)
self._threads[thread_name].start()
self.queue_replenish()
while self._queue.qsize() > 0:
print(f'items processed: ({self._cnt})')
time.sleep(1)
self._done = True
for thread in self._threads.values():
if thread.is_alive():
thread.join()
print(f'items processed: ({self._cnt})')
def queue_replenish(self):
table_key_names = [key.get('AttributeName') for key in self.table.key_schema]
projection_expression = ', '.join('#' + key for key in table_key_names)
expression_attr_names = {'#' + key: key for key in table_key_names}
page = self.table.scan(
ProjectionExpression=projection_expression,
ExpressionAttributeNames=expression_attr_names
)
while page['Count'] > 0:
for item in page['Items']:
self._queue.put(item)
if 'LastEvaluatedKey' in page:
page = self.table.scan(
ProjectionExpression=projection_expression,
ExpressionAttributeNames=expression_attr_names,
ExclusiveStartKey=page['LastEvaluatedKey']
)
else:
break
def worker_thread(self):
thr_name = threading.current_thread().name
print(f'[{thr_name}] thread started')
with self.table.batch_writer() as batch:
while not self._done:
try:
item = self._queue.get_nowait()
except Empty:
time.sleep(1)
else:
try:
batch.delete_item(Key=item)
self._cnt += 1
except Exception as e:
print(e)
print(f'[{thr_name}] thread completed')
if __name__ == '__main__':
table = '...'
cleaner = DDBTableCleaner(table, threads_limit=10)
cleaner.run()