Athena unnest另一个结构的json数组中的字符串的json数组

时间:2019-09-06 08:03:35

标签: json presto amazon-athena

我有以下AWS Athena create table语句:

CREATE EXTERNAL TABLE IF NOT EXISTS s2cs3dataset.s2c_storage (
         `MessageHeader` string,
         `TimeToProcess` float,
         `KeyCreated` string,
         `KeyLastTouch` string,
         `CreatedDateTime` string,
         `TableReference` array<struct<`BusinessObject`: string,
         `TransactionType`: string,
         `ReferenceKeyId`: float,
         `ReferencePrimaryKey`: string,
         `IncludedTables`: array<string>>>,
         `SAPStoreReference` string 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
         'serialization.format' = '1' ) LOCATION 's3://api-dev-dpstorage-s3/S2C_INPUT/storage/' TBLPROPERTIES ('has_encrypted_data'='false');

由此,我要在此查询中选择以下项目:

SELECT MessageHeader,
TimeToProcess,
KeyCreated,
KeyLastTouch,
CreatedDateTime,
tr.BusinessObject, 
tr.TransactionType,
tr.ReferencePrimaryKey,
it.IncludedTables,
SAPStoreReference
FROM s2c_storage
cross join UNNEST(s2c_storage.tablereference) as p(tr)
cross join UNNEST(tr.IncludedTables) as p(it)

但是我遇到以下错误:

  

SYNTAX_ERROR:第9:1行:表达式“ it”的类型不是ROW

如果我删除底部交叉连接和引用它的列,则查询工作正常,因此尝试将struct数组中的字符串数组的JSON数据解包时,我做错了。有小费吗?

1 个答案:

答案 0 :(得分:0)

根据澄清的注释,tr.IncludedTables的类型为array(varchar)。 因此,在查询... CROSS JOIN UNNEST(tr.IncludedTables) AS p(it)中,it的类型为varchar。在select子句中,您可以将此值称为it(或提供别名:it as IncludedTables),但是不能使用it.IncludedTablesvarchar值)引用它。没有“字段”,因此特别是它没有IncludedTables字段。