此查询每月失败一次,如何重构?

时间:2018-02-15 19:15:08

标签: presto amazon-athena

此查询每月失败一次,因为BETWEEN部分无效。在value BETWEEN min AND max的位置,3月1日我的查询将再次失败,因为它将计算到partition_2 BETWEEN 28 AND 1。如何使此查询更可靠,但仍然只使用所需的分区?

WITH recent_tasks AS
(SELECT task_id, state, timestamp, partition_0, partition_1, partition_2,
  row_number() OVER (PARTITION BY task_id
               ORDER BY timestamp DESC) AS rn
FROM firehose
WHERE
 "partition_0" BETWEEN to_char(current_date - interval '1' day, 'yyyy') AND to_char(current_date, 'yyyy')
 and "partition_1" BETWEEN to_char(current_date - interval '1' day, 'mm') AND to_char(current_date, 'mm')
 and "partition_2" BETWEEN to_char(current_date - interval '1' day, 'dd') AND to_char(current_date, 'dd')
ORDER BY rn)
SELECT * FROM recent_tasks
WHERE rn=1

一对夫妇注意到:

  • 分区是char值而不是整数
  • partition_2是月份分区
  • 查询的目的是查找每个task_id的最新状态
  • 使用AWS Athena
  • 数据以S3 / yyyy / mm / dd格式存储,每天都是新分区

理想情况下,我的查询会正确处理每月转换:

BETWEEN FEB 10 AND FEB 11 (works with above)
BETWEEN FEB 28 AND MAR 1  (fails with above)
BETWEEN MAR 1 AND MAR 2   (works with above)

1 个答案:

答案 0 :(得分:0)

如果你想要达到零而不是28:

cast(to_char(current_date, 'dd') as signed)-1

所以,在03/01,这将返回1 - to_char(current_date,' dd'),然后减去它会给你一个零:

and "partition_2" BETWEEN to_char(cast(to_char(current_date, 'dd') as signed)-1) AND to_char(current_date, 'dd')