雅典娜返回返回零结果

时间:2021-01-21 05:19:34

标签: amazon-web-services partitioning aws-glue amazon-athena

嗨,我正在创建一个表格 -

CREATE EXTERNAL TABLE `historyrecordjson`(
  `last_name` string COMMENT 'from deserializer', 
  `first_name` string COMMENT 'from deserializer', 
  `email` string COMMENT 'from deserializer', 
  `country` string COMMENT 'from deserializer', 
  `city` string COMMENT 'from deserializer', 
  `event_time` bigint COMMENT 'from deserializer'
)
PARTITIONED BY ( 
  `account_id` string, 
  `year` string, 
  `month` string, 
  `day` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION
  's3://aguptahistoryrecordcopy/recordshistoryjson/'
TBLPROPERTIES (
  'projection.account_id.type'='injected', 
  'projection.day.range'='01,31', 
  'projection.day.type'='integer', 
  'projection.enabled'='true', 
  'projection.month.range'='01,12', 
  'projection.month.type'='integer', 
  'projection.year.range'='2020,3000', 
  'projection.year.type'='integer', 
  'storage.location.template'='s3://aguptahistoryrecordcopy/historyrecordjson/${account_id}/${year}/${month}/${day}')

当我在查询下运行时,它返回零记录-

SELECT * FROM "historyrecordjson" where account_id='acc-1234' AND year= '2021' AND month= '1' AND day='1' limit 10 ;

我的 S3 目录看起来像-

s3://aguptahistoryrecordcopy/historyrecordjson/account_id=acc-1234/year=2021/month=1/day=1/1b339139-326c-432f-90aa-15bf30f37be2.json

我可以看到分区正在加载为 - account_id=acc-1234/year=2021/month=1/day=1

我不确定我错过了什么。我在查询结果中看到数据扫描:0 KB

1 个答案:

答案 0 :(得分:0)

您使用的 DDL 用于文本分隔文件,因为您在 S3 中的实际数据是 JSON 数据。参考 https://github.com/rcongiu/Hive-JSON-Serde 并使用正确的 SerDe 和 JSOn 数据定义创建表。