我正在尝试使用Apache Drill。我是整个环境的新手,只是想了解Apache Drill的工作原理。
我正在尝试使用Apache Drill查询存储在s3上的json数据。 我的水桶是在美国东部(弗吉尼亚北部)创建的 我使用this链接为S3创建了一个新的存储插件。
以下是我的新S3 Storage Plugin的配置:
{
"type": "file",
"enabled": true,
"connection": "s3a://testing-drill/",
"config": {
"fs.s3a.access.key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"fs.s3a.secret.key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json",
"extensions": [
"json"
]
},
"avro": {
"type": "avro"
},
"sequencefile": {
"type": "sequencefile",
"extensions": [
"seq"
]
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
}
}
}
我还将core-site-example.xml
配置如下:
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>xxxxxxxxxxxxxxxxxxxx</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>xxxxxxxxxxxxxxxxxxxxxxxx</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-1.amazonaws.com</value>
</property>
</configuration>
但是当我尝试使用以下命令来使用/设置工作区时:
USE shiv.`root`;
它给了我以下错误:
Error: VALIDATION ERROR: Schema [shiv.root] is not valid with respect to either root schema or current default schema.
Current default schema: No default schema selected
[Error Id: 6d9515c0-b90f-48aa-9dc5-0c660f1c06ca on ip-10-0-3-241.ec2.internal:31010] (state=,code=0)
如果尝试执行show schemas;
,则会收到以下错误:
show schemas;
Error: SYSTEM ERROR: AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: EEB438A6A0A5E667, AWS Error Code: null, AWS Error Message: Bad Request
Fragment 0:0
[Error Id: 85883537-9b4f-4057-9c90-cdaedec116a8 on ip-10-0-3-241.ec2.internal:31010] (state=,code=0)
我无法理解这个问题的根本原因。
答案 0 :(得分:0)
将Apache Drill与GCS(Google云存储)一起使用时,我遇到了类似的问题
运行USE gcs.data
查询时出现以下错误。
VALIDATION ERROR: Schema [gcs.data] is not valid with respect to either root schema or current default schema.
Current default schema: No default schema selected
我运行了SHOW SCHEMAS
,但没有gcs.data
模式。
我继续在我的GCS存储桶中创建了data
文件夹,gcs.data
出现在SHOW SCHEMAS
中,并且USE gcs.data
查询工作正常。
根据我对apache drill的有限经验,我了解到, 在文件存储中,如果您的工作区使用的文件夹不存在,则追溯将引发此错误。
GCS和S3都是文件类型存储,所以也许您遇到了这个问题。
这是我的GCS存储配置
{
"type": "file",
"connection": "gs://my-gcs-bkt",
"config": null,
"workspaces": {
"data": {
"location": "/data",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
"formats": {
"parquet": {
"type": "parquet"
},
"json": {
"type": "json",
"extensions": [
"json"
]
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
}
},
"enabled": true
}