哪两个PostgreSQL索引更有效?

时间:2015-11-01 17:57:11

标签: database performance postgresql indexing query-optimization

我有以下PostgreSQL架构:

CREATE TABLE User (
    ID INTEGER PRIMARY KEY
);

CREATE TABLE BOX (
    ID INTEGER PRIMARY KEY 
);

CREATE SEQUENCE seq_item;

CREATE TABLE Item (
    ID INTEGER PRIMARY KEY DEFAULT nextval('seq_item'),
    SENDER INTEGER REFERENCES User(id),
    RECEIVER INTEGER REFERENCES User(id),
    INFO TEXT,
    BOX_ID INTEGER REFERENCES Box(id) NOT NULL,
    ARRIVAL TIMESTAMP
);

其主要用例是典型的生产者/消费者场景。不同的用户可以在特定用户的特定框中的数据库中插入项目,并且每个用户可以在发给她/他的框中检索最顶部(这意味着最旧的)项目。它或多或少地模仿了数据库级别的队列功能。

更确切地说,最常见的操作如下:

INSERT INTO ITEM(SENDER, RECEIVER, INFO, BOX_ID, ARRIVAL) 
VALUES (nsid, nrid, ncontent, nqid, ntime);

并根据RECEIVER+SENDERRECEIVER+BOX_ID

的组合检索命令
SELECT * INTO it FROM Item i WHERE (i.RECEIVER=? OR i.RECEIVER is NULL) AND 
(i.BOX_ID=?) ORDER BY ARRIVAL LIMIT 1;
DELETE FROM Item i WHERE i.id=it.id;

SELECT * INTO it FROM Item i WHERE (i.RECEIVER=? OR i.RECEIVER is NULL) AND 
(i.SENDER=?) ORDER BY ARRIVAL LIMIT 1;
DELETE FROM Item i WHERE i.id=it.id;

最后两个片段包装在一个存储过程中。

我考虑使用两种不同的索引。

1。 CREATE INDEX ind ON item(arrival);。上述EXPLAIN的{​​{1}}计划如下:

SELECT

据我了解,这种方法的优点是我避免对数据进行排序。但是,据我所知,我仍然需要扫描整个表,但访问将是随机的,这将减慢执行速度。我不确定的是,如果由于Limit (cost=0.29..2.07 rows=1 width=35) -> Index Scan using ind on item i (cost=0.29..3010.81 rows=1693 width=35) Filter: (((receiver = 2) OR (receiver IS NULL)) AND (sender = 2)) 找到匹配后执行将立即停止,或者它将始终扫描整个表。

2。 LIMIT 1 CREATE INDEX ind ON item(receiver, sender);

EXPLAIN

在这种情况下,我可以有效地找到Limit (cost=512.23..512.23 rows=1 width=35) -> Sort (cost=512.23..516.46 rows=1693 width=35) Sort Key: arrival -> Bitmap Heap Scan on message m (cost=42.37..503.76 rows=1693 width=35) Recheck Cond: (((receiver = 2) AND (sender = 2)) OR ((receiver IS NULL) AND (sender = 2))) -> BitmapOr (cost=42.37..42.37 rows=1693 width=0) -> Bitmap Index Scan on ind (cost=0.00..37.22 rows=1693 width=0) Index Cond: ((receiver = 2) AND (sender = 2)) -> Bitmap Index Scan on ind (cost=0.00..4.30 rows=1 width=0) Index Cond: ((receiver IS NULL) AND (sender = 2)) receiver的匹配项,但我需要对结果进行排序,这可能会很慢。

那两个选项中的哪一个更好,为什么?第一个的估计成本是较低的,但第二个似乎是更多的确定性"。

1 个答案:

答案 0 :(得分:1)

对于此查询:

SELECT * INTO it
FROM Item i
WHERE (i.RECEIVER = ? OR i.RECEIVER is NULL) AND 
      (i.SENDER = ?)
ORDER BY ARRIVAL
LIMIT 1;

最佳索引可能是item(sender, arrival, receiver),按顺序排列。这将按发件人过滤,然后使用索引进行排序,然后由接收者再次过滤。

最快的方法可能是:

select *
from ((select i.*
       from item i
       where receiver = ? and sender = ?
       order by arrival
       limit 1
      ) union all
      (select i.*
       from item i
       where receiver is null and sender = ?
       order by arrival
       limit 1
      ) 
     ) i
order by arrival
limit 1;

此版本的最佳索引是item(sender, receiver, arrival)。它将使用索引来获取(最多)每个子查询中的一行。最后的排序(两行)可以忽略不计。

当然,同样的推理也适用于其他查询。