我有一个巨大的段落配置单元查询代码,如下所示:
select
count(distinct case when click_day between ${hiveconf:dt_180} and ${hiveconf:dt_end} and recommend_flag=1 then productid else null end) as unique_hk_products_cnt_180d,
count(distinct case when click_day between ${hiveconf:dt_90} and ${hiveconf:dt_end} and recommend_flag=1 then productid else null end) as unique_hk_products_cnt_90d,
count(distinct case when click_day between ${hiveconf:dt_30} and ${hiveconf:dt_end} and recommend_flag=1 then productid else null end) as unique_hk_products_cnt_30d,
count(distinct case when click_day between ${hiveconf:dt_15} and ${hiveconf:dt_end} and recommend_flag=1 then productid else null end) as unique_hk_products_cnt_15d,
count(distinct case when click_day between ${hiveconf:dt_7} and ${hiveconf:dt_end} and recommend_flag=1 then productid else null end) as unique_hk_products_cnt_7d
from mytable ;
这些字段之间的唯一区别是天数,表示时间窗口的长度。 这使我的查询非常大,很难犯错误。
dt_15只是之前定义的字符串变量:
set dt_15 = CONCAT(SUBSTRING(date_sub(current_date,15), 1, 4), SUBSTRING(date_sub(current_date,15), 6, 2), SUBSTRING(date_sub(current_date,15), 9, 2));
任何人都可以帮我重建它更简单吗?比如在新表中使用循环到产品字段?
感谢。
答案 0 :(得分:0)
试试这个
select count (case when click_day between ${hiveconf:dt_180} and ${hiveconf:dt_end} then productid end) as unique_hk_products_cnt_180d
,count (case when click_day between ${hiveconf:dt_90} and ${hiveconf:dt_end} then productid end) as unique_hk_products_cnt_90d
,count (case when click_day between ${hiveconf:dt_30} and ${hiveconf:dt_end} then productid end) as unique_hk_products_cnt_30d
,count (case when click_day between ${hiveconf:dt_15} and ${hiveconf:dt_end} then productid end) as unique_hk_products_cnt_15d
,count (case when click_day between ${hiveconf:dt_7} and ${hiveconf:dt_end} then productid end) as unique_hk_products_cnt_7d
from (select click_day,recommend_flag,productid
,row_number() over
(
partition by productid
order by click_day desc
) as rn
from mytable
where click_day between ${hiveconf:dt_180} and ${hiveconf:dt_end}
and recommend_flag=1
) t
where rn = 1
P.S。
您是否以非标准形式存储日期?
答案 1 :(得分:0)
试试这个: 使用内置日期函数
set dt_15 = from_unixtime(unix_timestamp(date_sub(current_date,15),'yyyy-mm-dd'),'yyyymmdd')
用于设置值,因为这将删除concat和substring操作。
select
count(case when click_day between ${hiveconf:dt_180} and ${hiveconf:dt_end} then productid else null end) as unique_hk_products_cnt_180d,
count(case when click_day between ${hiveconf:dt_90} and ${hiveconf:dt_end} then productid else null end) as unique_hk_products_cnt_90d,
count(case when click_day between ${hiveconf:dt_30} and ${hiveconf:dt_end} then productid else null end) as unique_hk_products_cnt_30d,
count(case when click_day between ${hiveconf:dt_15} and ${hiveconf:dt_end} then productid else null end) as unique_hk_products_cnt_15d,
count(case when click_day between ${hiveconf:dt_7} and ${hiveconf:dt_end} then productid else null end) as unique_hk_products_cnt_7d
from (select distinct click_day,productid where recommend_flag = 1 ) tmp ;
这会减少输入音量。如果click_day < dt_end
对所有列都相同,也可以将其删除。