在重新编写此查询时需要指导

时间:2019-05-20 08:10:15

标签: sql performance hive query-optimization hiveql

我们有此查询,可以运行以生成日历周数据 该查询两次点击相同的视图。可能由于没有join ON子句而创建了笛卡尔积。

反正有最佳地重写此查询的方法。

SELECT cal_date,
       regexp_replace(cal_date, '-', '') AS PC_cal_date,
       year_num*100+week_num AS year_week_num,
       CASE
           WHEN year_num*100+pd_num IN (min_year_pd_num, max_year_pd_num) THEN 'A'
           ELSE 'B'
       END AS yr_pd_ind,
       year_num*100+pd_num AS yr_pd_num,
       dense_rank() OVER (ORDER BY year_num*100+week_num DESC) AS wk_index,
                         dense_rank() OVER (ORDER BY year_num*100+pd_num DESC) AS pd_index
FROM mstr_v.local_cal_date t1,

  (SELECT max(year_num*100+pd_num) max_year_pd_num,
          min(year_num*100+pd_num) min_year_pd_num
   FROM mstr_v.local_cal_date
   WHERE cal_date IN (date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+105*7+1)),
                      date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)))) ) t2
WHERE cal_date BETWEEN date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+105*7)) 
AND date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+1))

1 个答案:

答案 0 :(得分:1)

如果将当前t2内的where子句移到case语句中并使用over()计算min和max,则可以在不进行第二次表扫描的情况下(在同一子查询中)计算在t2子查询中计算的列。

SELECT cal_date,
       regexp_replace(cal_date, '-', '') AS PC_cal_date,
       year_num*100+week_num AS year_week_num,
       CASE
           WHEN year_num*100+pd_num IN (min_year_pd_num, max_year_pd_num) THEN 'A'
           ELSE 'B'
       END AS yr_pd_ind,
       year_num*100+pd_num AS yr_pd_num,
       dense_rank() OVER (ORDER BY year_num*100+week_num DESC) AS wk_index,
                         dense_rank() OVER (ORDER BY year_num*100+pd_num DESC) AS pd_index
FROM (select t1.*,            
             max(case when cal_date IN (date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+105*7+1)),
                                        date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int))))        
                      then  year_num*100+pd_num end) over() as max_year_pd_num,
             min(case when cal_date IN (date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+105*7+1)),
                                        date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)))) 
                      then year_num*100+pd_num end) over() as min_year_pd_num
      from mstr_v.local_cal_date t1
)t1
WHERE cal_date BETWEEN date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+105*7)) 
AND date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+1))