我正在使用以下查询。由于我试图自我加入的数据非常庞大,因此需要花费大量时间。有人可以指导我如何优化这个查询吗?
我也在考虑添加指数。我总共有19列。我每个表中有大约1000,000行一个月。有人可以建议解决这个问题的最佳方法吗?
解释计划:
OPERATION OBJECT_NAME CARDINALITY COST
SELECT STATEMENT
5222342 34282
SORT
1
PX COORDINATOR
PX SEND
:TQ10000 1
SORT
1
PX BLOCK
18 466
TABLE ACCESS
SUCCESS_SIXMONTHS_JUL 18 466
Filter Predicates
AND
COLUMN14=:B1
COLUMN7=:B2
COLUMN13>=:B3
SORT
1
PX COORDINATOR
PX SEND
:TQ20000 1
SORT
1
PX BLOCK
18 466
TABLE ACCESS
SUCCESS_SIXMONTHS_JUL 18 466
Filter Predicates
AND
COLUMN14=:B1
COLUMN7=:B2
COLUMN13>=:B3
PX COORDINATOR
PX SEND
:TQ30001 5222342 34282
HASH
5222342 34282
PX RECEIVE
5222342 34282
PX SEND
:TQ30000 5222342 34282
HASH
5222342 34282
PX BLOCK
5222342 490
TABLE ACCESS
START_SIXMONTHS_JUL 5222342 490
SQL:
SELECT
DISTINCT
StMT.id1
, TIMESTAMP_for_start_message
, (SELECT MIN(TIMESTAMP_for_success_message)
FROM SuccessMessageTable
WHERE
(id1 = StMT.id1)
AND (someDate = StMT.someDate)
AND (jobID = StMT.jobID)
AND (TIMESTAMP_for_success_message >= StMT.TIMESTAMP_for_start_message)) TIMESTAMP_for_success_message
, (SELECT MIN(seconds_for_success_message)
FROM SuccessMessageTable
WHERE
(id1 = StMT.id1)
AND (someDate = StMT.someDate)
AND (jobID = StMT.jobID)
AND (TIMESTAMP_for_success_message >= StMT.TIMESTAMP_for_start_message)) seconds_for_success_message
, StMT.someDate
, StMT.jobID
FROM StartMessageTable StMT
ORDER BY id1, jobID, TIMESTAMP_for_start_message;
答案 0 :(得分:1)
对于与性能调优相关的问题,您应该始终至少提出execution plan
。
首先,您可以使用subquery factoring
重写查询。如果您多次使用子查询,最好将其用作WITH
子句。您不必多次重新定义相同的子查询。相反,我们只使用WITH子句中定义的查询名称,使查询更容易阅读。
例如,
WITH DATA AS(
SELECT MIN(SMT.TIMESTAMP_for_success_message)
FROM SuccessMessageTable SMT, StartMessageTable StMT
WHERE
(SMT.id1 = StMT.id1)
AND (SMT.someDate = StMT.someDate)
AND (SMT.jobID = StMT.jobID)
AND (SMT.TIMESTAMP_for_success_message >= StMT.TIMESTAMP_for_start_message)
)
SELECT ... FROM DATA A, table1 b, table2 c
...
例如,我修改了您的子查询以从SuccessMessageTable
和StartMessageTable
表中获取数据。此temporary
结果集可用于join
与其他表一起获取所需的行。
通过执行此操作,子查询结果集将被提取一次并解析为temporary table
。因此,对子查询的重复引用可能更有效,因为可以从临时表中轻松检索数据,而不是被每个引用重复查询。
详细了解http://oracle-base.com/articles/misc/with-clause.php
修改强>
我认为以下查询应该可以正常使用 -
SELECT stmt.id,
stmt.somedate,
stmt.jobid,
stmt.timestamp_for_start_message,
min(smt.timestamp_for_success_message) timestamp_for_success_message,
min(smt.seconds_for_success_message) seconds_for_success_message ,
FROM successmessagetable smt,
startmessagetable stmt
WHERE (
smt.id1 = stmt.id1)
AND (
smt.somedate = stmt.somedate)
AND (
smt.jobid = stmt.jobid)
AND (
smt.timestamp_for_success_message >= stmt.timestamp_for_start_message)
GROUP BY stmt.id,
stmt.somedate,
stmt.jobid,
stmt.timestamp_for_start_message
ORDER BY stmt.id1,
stmt.jobid,
smt.timestamp_for_start_message;
答案 1 :(得分:0)
我认为这相当于你的要求。我不相信DISTINCT仍然是必要的。
SELECT DISTINCT
M.ID1,
M.Timestamp_for_start_message,
MIN(S.Timestamp_for_success_message) Timestamp_for_success_message,
MIN(S.Seconds_for_success_message) Seconds_for_success_message
M.SomeDate,
M.JobID
FROM StartMessageTable M
JOIN SuccessMessageTable S
ON S.ID1 = M.ID1
AND S.SomeDate = M.SomeDate
AND S.JobID = M.JobID
AND S.Timestamp_for_success_message >= M.Timestamp_for_start_message
GROUP BY M.ID1, M.Timestamp_for_start_message, M.SomeDate, M.JobID
ORDER BY M.ID1, M.JobID, M.Timestamp_for_start_message;
请注意,并不清楚' Seconds_for_success_message'值将来自&Timehuamp_for_success_message' Timestamp_for_success_message'值。它可能会,但查询的结构并不能保证 - 但问题中的查询也是如此。