优化自我加入

时间:2015-01-06 05:59:17

标签: sql oracle view subquery self-join

我正在使用以下查询。由于我试图自我加入的数据非常庞大,因此需要花费大量时间。有人可以指导我如何优化这个查询吗?

我也在考虑添加指数。我总共有19列。我每个表中有大约1000,000行一个月。有人可以建议解决这个问题的最佳方法吗?

解释计划:

OPERATION    OBJECT_NAME     CARDINALITY     COST 

 SELECT STATEMENT 
     5222342     34282 

 SORT 
     1     

 PX COORDINATOR 


 PX SEND 
 :TQ10000    1     

 SORT 
     1     

 PX BLOCK 
     18      466 

 TABLE ACCESS 
 SUCCESS_SIXMONTHS_JUL   18      466 

 Filter Predicates 

 AND 

 COLUMN14=:B1 

 COLUMN7=:B2 

 COLUMN13>=:B3 

 SORT 
     1     

 PX COORDINATOR 


 PX SEND 
 :TQ20000    1     

 SORT 
     1     

 PX BLOCK 
     18      466 

 TABLE ACCESS 
 SUCCESS_SIXMONTHS_JUL   18      466 

 Filter Predicates 

 AND 

 COLUMN14=:B1 

 COLUMN7=:B2 

 COLUMN13>=:B3 

 PX COORDINATOR 


 PX SEND 
 :TQ30001    5222342     34282 

 HASH 
     5222342     34282 

 PX RECEIVE 
     5222342     34282 

 PX SEND 
 :TQ30000    5222342     34282 

 HASH 
     5222342     34282 

 PX BLOCK 
     5222342     490 

 TABLE ACCESS 
 START_SIXMONTHS_JUL     5222342     490 

SQL:

SELECT
  DISTINCT
  StMT.id1
  , TIMESTAMP_for_start_message
  , (SELECT MIN(TIMESTAMP_for_success_message)
     FROM SuccessMessageTable
     WHERE
       (id1 = StMT.id1)
       AND (someDate = StMT.someDate)
       AND (jobID = StMT.jobID)
       AND (TIMESTAMP_for_success_message >= StMT.TIMESTAMP_for_start_message)) TIMESTAMP_for_success_message
, (SELECT MIN(seconds_for_success_message)
     FROM SuccessMessageTable
     WHERE
       (id1 = StMT.id1)
       AND (someDate = StMT.someDate)
       AND (jobID = StMT.jobID)
       AND (TIMESTAMP_for_success_message >= StMT.TIMESTAMP_for_start_message)) seconds_for_success_message
  , StMT.someDate
  , StMT.jobID
FROM StartMessageTable StMT
ORDER BY id1, jobID, TIMESTAMP_for_start_message;

2 个答案:

答案 0 :(得分:1)

对于与性能调优相关的问题,您应该始终至少提出execution plan

首先,您可以使用subquery factoring重写查询。如果您多次使用子查询,最好将其用作WITH子句。您不必多次重新定义相同的子查询。相反,我们只使用WITH子句中定义的查询名称,使查询更容易阅读。

例如,

WITH DATA AS(
SELECT MIN(SMT.TIMESTAMP_for_success_message)
     FROM SuccessMessageTable SMT, StartMessageTable StMT
     WHERE
       (SMT.id1 = StMT.id1)
       AND (SMT.someDate = StMT.someDate)
       AND (SMT.jobID = StMT.jobID)
       AND (SMT.TIMESTAMP_for_success_message >= StMT.TIMESTAMP_for_start_message)
)
SELECT ... FROM DATA A, table1 b, table2 c
...

例如,我修改了您的子查询以从SuccessMessageTableStartMessageTable表中获取数据。此temporary结果集可用于join与其他表一起获取所需的行。

通过执行此操作,子查询结果集将被提取一次并解析为temporary table。因此,对子查询的重复引用可能更有效,因为可以从临时表中轻松检索数据,而不是被每个引用重复查询

详细了解http://oracle-base.com/articles/misc/with-clause.php

修改

我认为以下查询应该可以正常使用 -

SELECT   stmt.id, 
         stmt.somedate, 
         stmt.jobid, 
         stmt.timestamp_for_start_message, 
         min(smt.timestamp_for_success_message) timestamp_for_success_message,
         min(smt.seconds_for_success_message)   seconds_for_success_message ,
FROM     successmessagetable smt, 
         startmessagetable stmt 
WHERE    ( 
                  smt.id1 = stmt.id1) 
AND      ( 
                  smt.somedate = stmt.somedate) 
AND      ( 
                  smt.jobid = stmt.jobid) 
AND      ( 
                  smt.timestamp_for_success_message >= stmt.timestamp_for_start_message) 
GROUP BY stmt.id, 
         stmt.somedate, 
         stmt.jobid, 
         stmt.timestamp_for_start_message 
ORDER BY stmt.id1, 
         stmt.jobid, 
         smt.timestamp_for_start_message;

答案 1 :(得分:0)

我认为这相当于你的要求。我不相信DISTINCT仍然是必要的。

SELECT DISTINCT
       M.ID1,
       M.Timestamp_for_start_message,
       MIN(S.Timestamp_for_success_message) Timestamp_for_success_message,
       MIN(S.Seconds_for_success_message) Seconds_for_success_message
       M.SomeDate,
       M.JobID
  FROM StartMessageTable M
  JOIN SuccessMessageTable S
    ON S.ID1 = M.ID1
   AND S.SomeDate = M.SomeDate
   AND S.JobID = M.JobID
   AND S.Timestamp_for_success_message >= M.Timestamp_for_start_message
 GROUP BY M.ID1, M.Timestamp_for_start_message, M.SomeDate, M.JobID
 ORDER BY M.ID1, M.JobID, M.Timestamp_for_start_message;

请注意,并不清楚' Seconds_for_success_message'值将来自&Timehuamp_for_success_message' Timestamp_for_success_message'值。它可能会,但查询的结构并不能保证 - 但问题中的查询也是如此。