Oracle性能使用sum聚合函数连接3个表

时间:2014-11-19 15:23:52

标签: oracle performance sum

我有三张桌子 术语(id,isn,SentenceID,Term_Root,sentence_length)包含完整的语料库(超过2000万条记录) User_Terms(id,isn,SentenceID,Term_Roott,sentence_length)包含用户文档信息(约100000条记录) 相关性(id,Term1,Term2,Correlation_Factor)包含静态数据,描述任意2个术语(包含约500000条记录)之间的相关性和相似性

我想找到用户文档句子与语料库中所有文档之间的相似性。通过加入用户术语和语料库术语并找到相关因子,并对每个句子的结果求和 我使用了这个查询

SELECT   tt.SENTENCEID ,tu.SENTENCEID,

       sum (c.CORRELATION_FACTOR)/greatest( tu.sentence_length,tt.sentence_length),
      tt.isn ,
      tu.isn  
  FROM CORRELATIONS c,
TERMS tt,
User_TERMS tu
 WHERE (tt.TERM_ROOT  = c.TERM1
AND tu.TERM_ROOT = c.TERM2 

  AND tu.ISN='22242')
   group by tt.SENTENCE_ID, tu.SENTENCE_ID, tt.isn, tu.isn,tu.sentence_length,tt.sentence_length
having 

      sum (c.CORRELATION_FACTOR)/greatest( tu.sentence_length,tt.sentence_length)>0.6;

此查询需要10分钟以上。 我怎样才能尽可能快地重写它?我需要什么指数?

编辑:这是我的解释计划: | 0 |选择声明|| 2670 | 219K | 3864(1)| 00:00:47 |

| * 1 |过滤器|| | | | |

| 2 | HASH GROUP BY || 2670 | 219K | 3864(1)| 00:00:47 |

| 3 | NESTED LOOPS || 53383 | 4379K | 3862(1)| 00:00:47 |

| * 4 | HASH JOIN || 1414 | 86254 | 1032(1)| 00:00:13 |

| * 5 | INDEX RANGE SCAN | INDEX5 | 1425 | 31350 | 8(0)| 00:00:01 |

| 6 | INDEX FAST FULL SCAN | PLAG_TERM_CORRELATIONS4_UK1 | 563K | 20M | 1020(1)| 00:00:13 |

| * 7 | INDEX RANGE SCAN | PLAG_TERMS_INDEX1 | 38 | 874 | 2(0)| 00:00:01 |


谓词信息(由操作ID标识):

1 - 过滤(SUM(" C"。" CORRELATION_FACTOR")/ GREATEST(" TU"。" SENTENCE_LENGTH",&# 34; TT"" SENTENCE_LENGT               H")> 0.6)

4 - 访问(" TU"。" TERM_ROOT" =" C"。" TERM2")

5 - 访问(" TU"。" ISN" =' 22242')

7 - 访问(" TT"。" TERM_ROOT" =" C"。" TERM1")

0 个答案:

没有答案