how to view data catalog table in S3 using redshift spectrum

时间:2018-06-05 05:10:19

标签: amazon-redshift aws-glue amazon-redshift-spectrum

I created external schema for my database in aws glue. I can see the list of table but I cannot look into the json data. redshift throws me this errors.

[Amazon](500310) Invalid operation: S3 Query Exception (Fetch)
Details: 
 -----------------------------------------------
  error:  S3 Query Exception (Fetch)
  code:      15001
  context:   Task failed due to an internal error. Error occured during Ion/JSON extractor match: IERR_INVALID_SYNTAX

  query:     250284
  location:  dory_util.cpp:717
  process:   query2_124_250284 [pid=12336]
  -----------------------------------------------;
1 statement failed.

I dont want to create external tables because I will create a view combining the external tables in the data catalog in aws glue.

Just an update:

I used aws glue crawler in creating the tables in the data catalog. They are in json format. If I use a job that will upload this data in redshift they are loaded as flat file (except arrays) in redshift table.

Example of json data:

{
  "array": [
    1,
    2,
    3
  ],
  "boolean": true,
  "null": null,
  "number": 123,
  "object": {
    "a": "b",
    "c": "d",
    "e": "f"
  },
  "string": "Hello World"
}

If I upload them using a job in aws glue the output will be like (as table)

see image

Now, I have trmendous amount of tables crawled in data catalog. I am struggling creating the individual script of this tables that is why an amazon redshift spectrum external schema can be helpful.

However when I query the external table in the external schema I am getting the error as posted above. I do not encounter problems with external tables from the data catalog if they are loaded as csv but the format files I need to read in redshift spectrum should be in json.

Is it possible to view the external table in redshift spectrum the same format when it is loaded using a job?

1 个答案:

答案 0 :(得分:0)

贝尼,
RedShift Spectrum引发的错误可能并不总是准确的。我只能确认使用JSON进行查询应类似于其他数据格式。顺便说一句,外部表需要通过光谱数据库中的SQL客户端进行校正。

因此,我建议您参考thisthis来查看您的步骤