我试图在s3-select中查询json数据中的数据。
{
person = [
{
"Id": 1,
"Name": "Anshu",
"Address": "Templestowe",
"Car": "Jeep"
}
{
"Id": 2,
"Name": "Ben Mostafa",
"Address": "Las Vegas",
"Car": "Mustang"
}
{
"Id": 3,
"Name": "Rohan Wood",
"Address": "Wooddon",
"Car": "VW"
}
]
}
QUERY = "select * from S3Object s"
QUERY = "select s.person from S3Object s"
QUERY = "select s.person[0] from S3Object s"
QUERY = "select s.person[0].Name from S3Object s"
所有这些查询都可以正常工作,并根据需要返回相应的对象 当我试图在名称/汽车上搜索数据时,它不起作用。
QUERY = "select * from S3Object s where s.person.Name = \"Anshu\" "
错误:com.amazonaws.services.s3.model.AmazonS3Exception:第1行第32列的列索引无效。
s3-select online上没有太多相关内容。 想知道我们是否可以查询字段名称! 在文档
中没有给出s3-select的where子句的select查询示例答案 0 :(得分:3)
我在任何AWS文档中都找不到这个,但是我只是在玩耍,发现了一个有效的语法:
QUERY = "select * from S3Object s where 'Anshu' in s.person[*].Name"
基于一些推论:
使用Python和Boto3进行证明:
import boto3
S3_BUCKET = 'your-bucket-name'
s3 = boto3.client('s3')
r = s3.select_object_content(
Bucket=S3_BUCKET,
Key='your-file-name.json',
ExpressionType='SQL',
Expression="select * from s3object s where 'Anshu' in s.person[*].Name",
InputSerialization={'JSON': {"Type": "Lines"}},
OutputSerialization={'JSON': {}}
)
for event in r['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
很奇怪,我知道。请记住在〜/ .aws / credentials文件中设置[默认]凭据。
答案 1 :(得分:0)
你不能这样做。你需要稍微“压扁”你的JSON,所以看起来像这样:
{
person: {
"Id": 1,
"Name": "Anshu",
"Address": "Templestowe",
"Car": "Jeep"
}
}
{
person: {
"Id": 2,
"Name": "Ben Mostafa",
"Address": "Las Vegas",
"Car": "Mustang"
}
}
{
person:{
"Id": 3,
"Name": "Rohan Wood",
"Address": "Wooddon",
"Car": "VW"
}
}
下面的查询将按照指示
运行从s3object s中选择*,其中s.person.name ='Anshu'
答案 2 :(得分:0)
阅读AWS文档后,我发现以下SQL可以正常工作。
select * from S3Object[*].person[*] as p where p.Name='Anshu'
此SQL将为您提供所有名称为“ Anshu”的人,例如:
{
"Id": 1,
"Name": "Anshu",
"Address": "Templestowe",
"Car": "Jeep"
}
看到[*]
时,它表示一个json数组。
Amazon S3 Select始终将JSON文档视为根级别值的数组,因此我们在SQL中使用S3Object[*]
。而且person
的值是一个数组,因此我们在SQL中使用person[*]
。