我有一个对Oracle数据库的SQL查询 (查询通过与集群中每个元素的最大平均连接将元素链接到集群相关的集群。两个元素的关系由它们共有的链接数定义。我们计算给定元素的关系到集群的每个元素,然后将元素与集群的关系作为此元素与集群的所有元素的关系的平均值,然后选择我们的元素具有最大关系的集群并将该元素添加到集群)
INSERT INTO t_clusters (transaction_id,
CLUSTER_ID,
word_id,
connection)
SELECT 72,
CLUSTER_ID,
word_id,
avg_cnt
FROM ( SELECT word_id, CLUSTER_ID, AVG (cnt) avg_cnt
FROM ( SELECT a.word_id,
b.word_id bword,
c.CLUSTER_ID,
COUNT (*) cnt
FROM t_semantic a,
t_words a1,
t_semantic b,
t_clusters c
WHERE a.transaction_id = 72
AND a.word_id = a1.word_id
AND a1.checked = 0
AND CLUSTER_ID <> 0
AND a.link_id = b.link_id
AND b.word_id = c.word_id
GROUP BY a.word_id, b.word_id, c.CLUSTER_ID)
GROUP BY word_id, CLUSTER_ID
HAVING AVG (cnt) >= 3) zz
WHERE avg_cnt = (SELECT MAX (avg_cnt)
FROM ( SELECT AVG (cnt) avg_cnt, word_id
FROM ( SELECT a.word_id,
b.word_id bword,
c.CLUSTER_ID,
COUNT (*) cnt
FROM t_semantic a,
t_words a1,
t_semantic b,
t_clusters c
WHERE a.transaction_id = 72
AND a.word_id = a1.word_id
AND a1.checked = 0
AND CLUSTER_ID <> 0
AND a.link_id = B.link_id
AND b.word_id = c.word_id
GROUP BY a.word_id,
b.word_id,
c.CLUSTER_ID)
GROUP BY word_id, CLUSTER_ID
HAVING AVG (cnt) > 3) zzy
WHERE zzy.word_id = zz.word_id)
它已经执行了23个小时
关系表(t_semantic)中的记录数为532125 集群表(t_clusters)中已存储元素已连接到集群的记录数为23014 表中没有连接到簇的元素(t_words)的记录数是36111
在所有表中索引的字段word_id,cluster_id,link_id。 是否有任何方法可以改善此查询的性能以及为什么它如此慢?
Plan
INSERT STATEMENT ALL_ROWSCost: 9,234
28 FILTER
10 HASH GROUP BY Cost: 9,234 Bytes: 46,000 Cardinality: 2,000
9 VIEW Cost: 9,234 Bytes: 8,599,493 Cardinality: 373,891
8 SORT GROUP BY Cost: 9,234 Bytes: 15,329,531 Cardinality: 373,891
7 HASH JOIN Cost: 5,215 Bytes: 15,329,531 Cardinality: 373,891
1 TABLE ACCESS FULL TABLE DKIM.T_WORDS Cost: 86 Bytes: 286,568 Cardinality: 35,821
6 HASH JOIN Cost: 5,117 Bytes: 20,274,408 Cardinality: 614,376
4 HASH JOIN Cost: 2,245 Bytes: 4,108,540 Cardinality: 205,427
2 INDEX FAST FULL SCAN INDEX (UNIQUE) DKIM.UK_CLUSTER Cost: 333 Bytes: 229,140 Cardinality: 22,914
3 TABLE ACCESS FULL TABLE DKIM.T_SEMANTIC Cost: 1,903 Bytes: 5,277,070 Cardinality: 527,707
5 TABLE ACCESS FULL TABLE DKIM.T_SEMANTIC Cost: 1,909 Bytes: 6,859,632 Cardinality: 527,664
27 SORT AGGREGATE Bytes: 26 Cardinality: 1
26 VIEW Cost: 18 Bytes: 26 Cardinality: 1
25 FILTER
24 SORT GROUP BY Cost: 18 Bytes: 23 Cardinality: 1
23 VIEW Cost: 18 Bytes: 23 Cardinality: 1
22 SORT GROUP BY Cost: 18 Bytes: 41 Cardinality: 1
21 TABLE ACCESS BY INDEX ROWID TABLE DKIM.T_CLUSTERS Cost: 4 Bytes: 10 Cardinality: 1
20 NESTED LOOPS Cost: 17 Bytes: 41 Cardinality: 1
18 NESTED LOOPS Cost: 9 Bytes: 93 Cardinality: 3
15 NESTED LOOPS Cost: 5 Bytes: 21 Cardinality: 1
12 TABLE ACCESS BY INDEX ROWID TABLE DKIM.T_WORDS Cost: 2 Bytes: 8 Cardinality: 1
11 INDEX UNIQUE SCAN INDEX (UNIQUE) DKIM.T_WORDS_PK Cost: 1 Cardinality: 1
14 TABLE ACCESS BY INDEX ROWID TABLE DKIM.T_SEMANTIC Cost: 3 Bytes: 13 Cardinality: 1
13 INDEX RANGE SCAN INDEX DKIM.IND_SEMANTIC_WORD_ID Cost: 2 Cardinality: 1
17 TABLE ACCESS BY INDEX ROWID TABLE DKIM.T_SEMANTIC Cost: 4 Bytes: 30 Cardinality: 3
16 INDEX RANGE SCAN INDEX DKIM.IND_SEMANTIC_LINK_ID Cost: 2 Cardinality: 3
19 INDEX RANGE SCAN INDEX DKIM.CLUSTERS_WORD_ID Cost: 2 Cardinality: 1
答案 0 :(得分:1)
使用分析函数来确定max avg_cnt可能有助于加快速度,例如:
insert into t_clusters (transaction_id,
cluster_id,
word_id,
connection)
select 72,
cluster_id,
word_id,
avg_cnt
from (select word_id,
cluster_id,
avg(cnt) avg_cnt,
max(avg(cnt)) over (partition by word_id) max_avg_cnt
from (select a.word_id,
b.word_id bword,
c.cluster_id,
count (*) cnt
from t_semantic a,
t_words a1,
t_semantic b,
t_clusters c
where a.transaction_id = 72
and a.word_id = a1.word_id
and a1.checked = 0
and cluster_id <> 0
and a.link_id = b.link_id
and b.word_id = c.word_id
group by a.word_id, b.word_id, c.cluster_id)
group by word_id, cluster_id
having avg(cnt) >= 3)
where avg_cnt = max_avg_cnt;