Question

我正在尝试使用以下声明从存储在S3中的Parquet文件创建AWS Athena表，例如：

create table "db"."fufu" (
  foo array<
    struct<
      bar: int, 
      bam: int
    >
  >
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1') 
LOCATION 's3://yada/yada/'
TBLPROPERTIES ('has_encrypted_data'='false');

我一直收到以下错误：

line 3:11: mismatched input '<' expecting {'(', 'array', '>'} (service: amazonathena; status code: 400; error code: invalidrequestexception; request id: ...)

语法似乎是合法的，并且文件使用spark的镶木地板lib加载完美，结构类型为struct类型的struct字段。

知道什么可能导致此错误吗？

Answer 1

您需要从数据库名称和表名称中删除双引号。您还需要在external之前添加table。

create external table db.fufu (
  foo array<
    struct<
      bar: int, 
      bam: int
    >
  >
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1') 
LOCATION 's3://eth-test-ds/test/'
TBLPROPERTIES ('has_encrypted_data'='false');

从Parquet文件创建AWS Athena表，其中结构数组为列

1 个答案: