Question

我正在使用csv从S3阅读boto3文件，并希望访问该csv的特定列。我有这段代码，我使用csv将S3文件读入boto3对象，但我无法访问其中的特定列：

import boto3

s3 = boto3.resource('s3',aws_access_key_id = keyId, aws_secret_access_key = sKeyId)

obj = s3.Object(bucketName, srcFileName)

filedata = obj.get()["Body"].read()
print(filedata.decode('utf8'))

for row in filedata.decode('utf8'):
    print(row[1]) # Get the column at index 1

当我在上面执行此操作时，print(filedata.decode('utf8'))打印在我的输出控制台上：

51350612,Gary Scott
10100063,Justin Smith
10100162,Annie Smith
10100175,Lisa Shaw
10100461,Ricardo Taylor
10100874,Ricky Boyd
10103593,Hyman Cordero

但print(row[1])循环中的行for会将错误抛出为IndexError: string index out of range。

如何使用`boto3从S3删除此错误并从csv文件中访问特定列？

Answer 1

boto3.s3.get（）。read（）将检索整个文件的bytes对象。您的代码filedata.decode('utf8')仅将整个字节对象转换为String对象。这里没有解析。这是另一个答案from another answer的无耻副本。

import csv 
# ...... code snipped .... insert your boto3 code here

# Parse your file correctly 
lines = response[u'Body'].read().splitlines()
# now iterate over those lines
for row in csv.DictReader(lines):
    # here you get a sequence of dicts
    # do whatever you want with each line here
    print(row)

如果你只有一个简单的CSV文件，那么快速而又脏的修复

for row in filedata.decode('utf8').splitlines():
    items = row.split(',')
    print(items[0]. items[1])

How do I read a csv stored in S3 with csv.DictReader?

Answer 2

要正确读取CSV，请导入CSV python模块并使用其中一个读者。

文档：https://docs.python.org/2/library/csv.html

在访问使用boto3读取为S3对象的csv文件的特定列时出现问题

2 个答案: