我正在尝试使用AWS s3select功能查询镶木地板文件。根据{{3}}的支持,但我尝试了各种配置,但无法使其正常工作。在已显示为已注释掉的每个InputSerialization尝试中,我都列出了尝试该版本时收到的错误。有人可以告诉我如何正确配置吗?
import boto3
S3_BUCKET = 'myBucket'
KEY_LIST = "'0123','6789'"
S3_FILE = 'myFolder/myFile.parquet'
s3 = boto3.client('s3')
r = s3.select_object_content(
Bucket=S3_BUCKET,
Key=S3_FILE,
ExpressionType='SQL',
Expression="select \"Record\" from s3object s where s.\"Key\" in [" + KEY_LIST + "]",
# InputSerialization={}, # (MissingRequiredParameter) when calling the SelectObjectContent operation: InputSerialization is required
# InputSerialization={'CompressionType': { 'NONE' }}, # Invalid type for parameter InputSerialization.CompressionType, value: {'NONE'}, type: <class 'set'>, valid types: <class 'str'>
# InputSerialization={'Parquet': {}}, # Unknown parameter in InputSerialization: "Parquet", must be one of: CSV, CompressionType, JSON
# InputSerialization={'CompressionType': { 'Snappy' }}, # Invalid type for parameter InputSerialization.CompressionType, value: {'Snappy'}, type: <class 'set'>, valid types: <class 'str'>
OutputSerialization={'JSON': {}},
)
for event in r['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
答案 0 :(得分:0)
我需要将boto3安装升级到最新版本。升级到1.9.7后,此版本可以正常工作:
InputSerialization={'Parquet': {}},