Question

我正在尝试使用python程序将csv加载到dynamodb表中，如下所示，但变得像索引超出范围错误

输入的csv文件如下：

1st line is atrributes
2nd line is datatype for attributes
3rd line onwards actual data

csv文件内容：

customer_id,key_id,dashboard_name,tsm,security_block,core_block,type,subscription,account_id,region,sed,jumpbox,dc,av,gl,backup,cpm,zb
int,int,string,string,string,string,string,string,string,string,string,string,string,string,string,string,string,string
1,1,Act,yes,no,no,az,xxxxx-xxx-xxxx-xxxx-xxxx,null,eu-west-1,yes,yes,yes,no,yes,no,notapplicable,yes
1,2,Act,no,no,yes,az,xxxxx-xxx-xxxx-xxxx-xxxx,null,eu-west-1,no,yes,no,yes,no,yes,notapplicable,no
2,1,Cap,no,no,yes,aws,notapplicable,xxxxxxxx,us-west-2,yes,no,no,no,yes,no,yes,yes
2,2,Cap,yes,no,no,aws,notapplicable,xxxxxxxx,us-west-2,yes,no,no,no,yes,no,no,yes
2,3,Cap,no,yes,no,aws,notapplicable,xxxxxxxx,us-west-2,no,yes,no,yes,no,yes,yes,no
2,4,Cap,yes,no,no,aws,notapplicable,xxxxxxxx,us-west-1,yes,no,no,no,yes,no,no,yes
2,5,Cap,no,no,yes,aws,notapplicable,xxxxxxxx,us-east-1,no,yes,no,yes,no,yes,yes,yes

我尝试过的事情：

# Python Script to insert csv records in dynamodb table.
from __future__ import print_function  # Python 2/3 compatibility
from __future__ import division  # Python 2/3 compatiblity for integer division
import argparse
import boto3
from csv import reader
import time
# command line arguments
parser = argparse.ArgumentParser(
    description='Write CSV records to dynamo db table. CSV Header must map to dynamo table field names.')
parser.add_argument('csvFile', help='Path to csv file location')
parser.add_argument('table', help='Dynamo db table name')
parser.add_argument('writeRate', default=5, type=int, nargs='?',
                    help='Number of records to write in table per second (default:5)')
parser.add_argument('delimiter', default=',', nargs='?', help='Delimiter for csv records (default=,)')
parser.add_argument('region', default='us-west-2', nargs='?', help='Dynamo db region name (default=us-west-2')
args = parser.parse_args()
print(args)

# dynamodb and table initialization
endpointUrl = "https://dynamodb.us-west-2.amazonaws.com"
dynamodb = boto3.resource('dynamodb', region_name=args.region, endpoint_url=endpointUrl)
table = dynamodb.Table(args.table)

# write records to dynamo db
with open(args.csvFile) as csv_file:
    tokens = reader(csv_file, delimiter=args.delimiter)
    # read first line in file which contains dynamo db field names
    header = next(tokens)
    # read second line in file which contains dynamo db field data types
    headerFormat = next(tokens)
    # rest of file contain new records
    for token in tokens:
        print(token)
        item = {}
        for i, val in enumerate(token):
            print(val)
            if val:
                key = header[i]
                if headerFormat[i] == 'int':
                    val = int(val)
                if headerFormat[i] == 'stringset':
                    tempVal = val.split('|')
                    val = set()
                    for tok in enumerate(tempVal):
                        print(tok)
                        val.add(str(tok[1]))
                print(val)
                item[key] = val
        print(item)
        table.put_item(Item=item)

        time.sleep(1 / args.writeRate)  # to accomodate max write provisioned capacity for table

我得到的错误：

Traceback (most recent call last):
  File "C:\csv\dbinsert.py", line 39, in <module>
    key = header[i]
IndexError: list index out of range

我正在传递文件名和表名作为参数。实际上，前两列是dynamodb表中的数字，这意味着在csv中，1,1被视为字符串？不知道我在哪里弄错了。

有人可以建议吗

Answer 1

修复了@jarmod的建议并添加并指向 u'\ufeff' in Python string

这有效：

Result()

使用python插入Dynamodb，给出索引超出范围的错误

1 个答案: