我需要一些帮助来优化此SQL查询。 这完全正常。我只想减少此查询的运行时间
select distinct
o.usrp_order_number,t.*
from ms_bvoip_order_extension oe
inner join ms_order o on oe.ms_order_id = o.ms_order_id
inner join ms_sub_order so on so.ms_order_id = o.ms_order_id
inner join ms_job j on j.entity_id = so.ms_sub_order_id
left join mstask t ON t.wf_job_id = j.wf_job_id
where
o.order_type = 900
and o.entered_date between date_sub(current_date(),53) and
date_sub(current_date(),3)
and j.entity_type = 5 and t.name RLIKE 'Error|Correct|Create AOTS Ticket' and t.wf_job_id is not null
order by
o.usrp_order_number
答案 0 :(得分:1)
在Hive中加入后将执行WHERE条件(尽管CBO和PPD可能会更改此行为),请更好地研究两个查询的EXPLAIN输出。您可以将以下条件移动:o.order_type = 900
到join ON子句以减少连接时的行数。 Hive中的join ON子句只允许涉及两个表列的非等式条件。表t也是左联接的,但是where
:t.name RLIKE 'Error|Correct|Create AOTS Ticket' and t.wf_job_id is null and t.ORIGINAL_START_DATE is not null
中的条件将左联接转换为内部联接。检查您是否需要INNER或LEFT JOIN
select distinct
o.usrp_order_number,t.*
from ms_bvoip_order_extension oe
inner join ms_order o
on oe.ms_order_id = o.ms_order_id
and o.order_type = 900
and and o.entered_date between date_sub(current_date(),53) and date_sub(current_date(),3)
inner join ms_sub_order so on so.ms_order_id = o.ms_order_id
inner join ms_job j on j.entity_id = so.ms_sub_order_id
and j.entity_type = 5
left join mstask t on t.wf_job_id = j.wf_job_id
and t.name RLIKE 'Error|Correct|Create AOTS Ticket'
and t.wf_job_id is null
and t.ORIGINAL_START_DATE is not null
order by o.usrp_order_number
也请阅读有关配置设置的以下答案:https://stackoverflow.com/a/48487306/2700344
答案 1 :(得分:0)
确保您在
上具有正确的索引表ms_order输入的日期,order_type,ms_order_id列上的复合索引
表ms_job在实体entity_type,entity_id列上的复合索引
表mstask在ORIGINAL_START_DATE wf_job_id列上的复合索引
表ms_sub_order在列ms_order_id上的索引
表ms_bvoip_order_extension和ms_order_id列上的索引
答案 2 :(得分:0)
您将需要为过滤依据的列添加索引。
我们不知道每个表保存多少记录,但是t.name RLIKE
条件应该作为最后一项进行评估。我将根据以下想法重写您的查询:
select ...
from
(
select ...
inner join ...
inner join ...
inner join ...
left join ...
where ...
) temporary
where temporary.somename RLIKE 'Error|Correct|Create AOTS Ticket'
o.usrp_order_number
如果查询不是非常动态,那么您甚至可以将结果缓存一段时间。