Question

我有大表包含数百万的数据（太大了）。

表格如下

Post
post_id,user_id,description,creation_date, xyz, abc ,etc

primarykey for post :post_id
partition key for Post : creation_date
index on Post : user_id

Comment:
commentid,post_id, comment_creation_date,comment_type,last_modified_date

Primary key of comment = commentid
indexed colums on Comment = commentid, postid
partition key for Comment table =  comment_creation_date

注意：我无法以任何方式构建新索引而不是更改表模式

评论类型是String

现在给出了comment_type和comment_creation_date范围的列表，我需要查找具有该类型的comment_type的所有帖子。

一个简单的非常低效的解决方案

    select * from post p, comment c where c.post_id = p.post_id where c.comment_creation_date > ? and c.comment_creation_date < ?
and p.posttype IN (some list)

如何优化此查询？如果相同的事情是由注释的last_modified_date而不是comment_date。注意：

last_modified_date is NOT indexed and comment_date Is

一旦查询成功，我想一起得到一个帖子的所有评论。示例if post1 with c1，c2，c3

PS：我不擅长设计查询。我知道IN对性能不利。

Answer 1

我不确定这是否可以节省时间，但是将评论部分移动到子查询可能会有所帮助：

SELECT *
FROM Post p
JOIN (SELECT *
      FROM Comment
      WHERE comment_creation_date > ? and comment_creation_date < ?
              AND 'stringlist' LIKE '%'||comment_type||'%'
     )c
ON c.post_id = p.post_id

Answer 2

您的查询在语法上是不正确的，因为它有两个where子句。另外，您在代码中引用comment_type，但在代码中引用post_type。我会假设后者。您可以将其重写为：

select *
from post p, comment c
where c.post_id = p.post_id and
      c.comment_creation_date > ? and c.comment_creation_date < ? and
      p.posttype IN (some list)

Oracle有一个很好的优化器，所以没有理由认为这会很好地优化。

虽然它对性能没有影响，但ANSI标准连接语法是编写查询的更好方法：

select *
from post p join
     comment c
     on c.post_id = p.post_id
where c.comment_creation_date > ? and c.comment_creation_date < ? and
      p.posttype IN (some list)

优化可以决定何时进行过滤以及如何进行连接。您可以通过comment(comment_creation_date, post_id)上的索引以及post(post_type)上的索引来提高任一版本的效率（后者取决于您拥有多少种不同的帖子类型，称为 selective 索引）。

我不确定你的意思是“我知道IN不利于表现。”这不是常识;请分享您对此的任何参考。据我所知，带有一堆常量的in应该比p.posttype = <value1> or p.posttype = <value2> . . .等一堆表达式更糟糕。

在Oracle中的两个表之间进行非常有效的连接

2 个答案: