Question

我正在尝试使用Apache Drill。我是整个环境的新手，只是想了解Apache Drill的工作原理。

我正在尝试使用Apache Drill查询存储在s3上的json数据。我的水桶是在美国东部（弗吉尼亚北部）创建的我使用this链接为S3创建了一个新的存储插件。

以下是我的新S3 Storage Plugin的配置：

{
  "type": "file",
  "enabled": true,
  "connection": "s3a://testing-drill/",
  "config": {
    "fs.s3a.access.key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    "fs.s3a.secret.key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  },
  "workspaces": {
    "root": {
      "location": "/",
      "writable": false,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    }
  },
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json",
      "extensions": [
        "json"
      ]
    },
    "avro": {
      "type": "avro"
    },
    "sequencefile": {
      "type": "sequencefile",
      "extensions": [
        "seq"
      ]
    },
    "csvh": {
      "type": "text",
      "extensions": [
        "csvh"
      ],
      "extractHeader": true,
      "delimiter": ","
    }
  }
}

我还将core-site-example.xml配置如下：

<configuration>

    <property>
        <name>fs.s3a.access.key</name>
        <value>xxxxxxxxxxxxxxxxxxxx</value>
    </property>

    <property>
        <name>fs.s3a.secret.key</name>
        <value>xxxxxxxxxxxxxxxxxxxxxxxx</value>
    </property>

    <property>
        <name>fs.s3a.endpoint</name>
        <value>s3.us-east-1.amazonaws.com</value>
    </property>

</configuration>

但是当我尝试使用以下命令来使用/设置工作区时：

USE shiv.`root`;

它给了我以下错误：

Error: VALIDATION ERROR: Schema [shiv.root] is not valid with respect to either root schema or current default schema.

Current default schema:  No default schema selected

[Error Id: 6d9515c0-b90f-48aa-9dc5-0c660f1c06ca on ip-10-0-3-241.ec2.internal:31010] (state=,code=0)

如果尝试执行show schemas;，则会收到以下错误：

show schemas;
Error: SYSTEM ERROR: AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: EEB438A6A0A5E667, AWS Error Code: null, AWS Error Message: Bad Request

Fragment 0:0

[Error Id: 85883537-9b4f-4057-9c90-cdaedec116a8 on ip-10-0-3-241.ec2.internal:31010] (state=,code=0)

我无法理解这个问题的根本原因。

Answer 1

将Apache Drill与GCS（Google云存储）一起使用时，我遇到了类似的问题

运行USE gcs.data查询时出现以下错误。

VALIDATION ERROR: Schema [gcs.data] is not valid with respect to either root schema or current default schema.

Current default schema:  No default schema selected

我运行了SHOW SCHEMAS，但没有gcs.data模式。

我继续在我的GCS存储桶中创建了data文件夹，gcs.data出现在SHOW SCHEMAS中，并且USE gcs.data查询工作正常。

根据我对apache drill的有限经验，我了解到，在文件存储中，如果您的工作区使用的文件夹不存在，则追溯将引发此错误。

GCS和S3都是文件类型存储，所以也许您遇到了这个问题。

这是我的GCS存储配置

{
  "type": "file",
  "connection": "gs://my-gcs-bkt",
  "config": null,
  "workspaces": {
    "data": {
      "location": "/data",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
    "root": {
      "location": "/",
      "writable": false,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    }
  },
  "formats": {
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json",
      "extensions": [
        "json"
      ]
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "csvh": {
      "type": "text",
      "extensions": [
        "csvh"
      ],
      "extractHeader": true,
      "delimiter": ","
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    },
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    }
  },
  "enabled": true
}

Apache Drill S3：未选择默认架构

1 个答案: