Question

我正在尝试使用DynamoDB操作BatchWriteItem，其中我想在一个表中插入多个记录。

此表具有一个分区键和一个排序键。

我正在使用AWS lambda和Go语言。

我得到了要插入到切片中的元素。

我正在执行此步骤。

创建PutRequest结构并为列表中的第一条记录添加AttributeValues。
我正在根据此WriteRequest

PutRequest

我正在将此WriteRequest添加到array of WriteRequests
我正在创建由BatchWriteItemInput组成的RequestItems，它基本上是 Tablename的映射和WriteRequests的数组。

此后，我打电话给BatchWriteItem，这会导致错误-提供的项键列表包含重复项。

任何指针，为什么会发生这种情况？

Answer 1

您为两个或多个项目提供了相同的分区/排序键。

对于BatchWriteItem文档，您不能对同一BatchWriteItem请求中的同一项目执行多项操作。

Answer 2

注意事项：此答案适用于Python

正如@Benoit所说，boto3文档指出：

如果您要不将单个批写入请求的重复限制作为botocore.exceptions.ClientError：调用BatchWriteItem操作时发生错误（ValidationException）：提供的项目键列表包含重复项。

您可以根据documentation和source code在批处理写入器上指定overwrite_by_pkeys=['partition_key', 'sort_key']，以便“如果匹配指定主键上的新请求项，则对缓冲区中的请求项进行重复数据删除”。也就是说，如果组合主要排序已存在于缓冲区中，它将删除该请求并将其替换为新的请求。。

示例

假设您要写入DynamoDB表的熊猫数据框，以下功能可能会有所帮助，

import json import datetime as dt import boto3 import pandas as pd from typing import Optional def write_dynamoDB(df:'pandas.core.frame.DataFrame', tbl:str, partition_key:Optional[str]=None, sort_key:Optional[str]=None): ''' Function to write a pandas DataFrame to a DynamoDB Table through batchWrite operation. In case there are any float values it handles them by converting the data to a json format. Arguments: * df: pandas DataFrame to write to DynamoDB table. * tbl: DynamoDB table name. * partition_key (Optional): DynamoDB table partition key. * sort_key (Optional): DynamoDB table sort key. ''' # Initialize AWS Resource dynamodb = boto3.resource('dynamodb') table = dynamodb.Table(tbl) # Check if overwrite keys were provided overwrite_keys = [partition_key, sort_key] if partition_key else None # Check if they are floats (convert to decimals instead) if any([True for v in df.dtypes.values if v=='float64']): from decimal import Decimal # Save decimals with JSON df_json = json.loads( json.dumps(df.to_dict(orient='records'), default=date_converter, allow_nan=True), parse_float=Decimal ) # Batch write with table.batch_writer(overwrite_by_pkeys=overwrite_keys) as batch: for element in df_json: batch.put_item( Item=element ) else: # If there are no floats on data # Batch writing with table.batch_writer(overwrite_by_pkeys=overwrite_keys) as batch: columns = df.columns for row in df.itertuples(): batch.put_item( Item={ col:row[idx+1] for idx,col in enumerate(columns) } ) def date_converter(obj): if isinstance(obj, dt.datetime): return obj.__str__() elif isinstance(obj, dt.date): return obj.isoformat()

通过致电write_dynamoDB(dataframe, 'my_table', 'the_partition_key', 'the_sort_key')。

Answer 3

使用batch_writer代替batch_write_item：

import boto3

dynamodb = boto3.resource("dynamodb", region_name='eu-west-1')
my_table = dynamodb.Table('mirrorfm_yt_tracks')

with my_table.batch_writer(overwrite_by_pkeys=["user_id", "game_id"]) as batch:
    for item in items:
        batch.put_item(
            Item={
                'user_id': item['user_id'],
                'game_id': item['game_id'],
                'score': item['score']
            }
        )

如果您没有排序键，则overwrite_by_pkeys可以是None

这与@MiguelTrejo本质上是相同的答案（谢谢！+1），但得到了简化

DynamoDB BatchWriteItem：提供的项目键列表包含重复项

3 个答案:

示例