是否有更好的方式编写此查询...考虑使用spark和hadoop的数百万行
select *
from (
SELECT *, row_number() over(PARTITION BY tran_id ORDER BY load_dt DESC) RN
FROM MySourceTable WHERE CAST(tradeDtae) as TIMESTAMP)
BETWEEN add_months(current_timestamp(), -64) AND current_timestamp()
AND sys_id = 'TRADING
) temp where temp.RN=1;
MySourceTable被tradeDtae
分割为int
查询连续运行了几个小时,无法执行 返回满足查询条件的行
答案 0 :(得分:0)
分区修剪可能不起作用,因为该功能已应用于tradeDtae列。尝试完全不起作用。同样,cast(timestamp)在Hive中无法正常工作,请考虑以下示例:
hive> select unix_timestamp(current_timestamp);
OK
1562741499
Time taken: 0.739 seconds, Fetched: 1 row(s)
hive> select cast(1562741499 as timestamp);
OK
1970-01-18 18:05:41.499
Time taken: 0.191 seconds, Fetched: 1 row(s)
hive> select current_timestamp;
OK
2019-07-09 23:53:07.662
Time taken: 1.482 seconds, Fetched: 1 row(s)
将 bigint Unix时间戳转换为时间戳的正确方法是使用from_unixtime:
hive> select from_unixtime(1562741499);
OK
2019-07-09 23:51:39
Time taken: 0.12 seconds, Fetched: 1 row(s)
我建议将参数分别作为unix时间戳计算,如果分区修剪不适用于此查询,则作为参数传递,请首先尝试以下方法:
FROM MySourceTable
WHERE tradeDtae BETWEEN unix_timestamp(add_months(current_timestamp(), -64),'yyyy-MM-dd') AND unix_timestamp()