Question

我有很多关系，看起来像这样：

class Feed
  has_many :cross_listed_papers, through: :cross_lists, source: :paper
end

经常（即在页面加载时），我需要查看，排序和分页交叉列出的论文，以获得给定时间段内的一组供稿。但是，这可能非常慢：

[107] pry（主要）＆gt; Paper.includes（：cross_lists）.where（cross_lists：{feed_id：[311,312,313,314]}）。where（“pubdate＆gt; =？and pubdate＆lt; =？”，Time.now - 300.days，Time.now）。计数

D，[2013-12-17T21：56：43.640283＃19404] DEBUG - ：（228.9ms）SELECT COUNT（DISTINCT“papers”。“id”）FROM“papers”LEFT OUTER JOIN“cross_lists”ON“ cross_lists“。”paper_id“=”papers“。”id“WHERE”cross_lists“。”feed_id“IN（311,312,313,314）AND（pubdate＆gt; ='2013-02-20 10：56：43.404072'和pubdate＆lt; =''2013-12-17 10：56：43.404234'）   =＆GT; 2811

对于可能最终在每个页面加载时发生的事情，228.9 ms并不理想，特别是因为如果我尝试加入更多数据（即使是不太广泛的时间范围），这会迅速膨胀。这是EXPLAIN ANALYZE：

Aggregate  (cost=110771.10..110771.11 rows=1 width=4) (actual time=243.826..243.826 rows=1 loops=1)
  ->  Hash Join  (cost=95343.72..110749.09 rows=8807 width=4) (actual time=93.725..242.725 rows=2830 loops=1)
        Hash Cond: (cross_lists.paper_id = papers.id)
        ->  Bitmap Heap Scan on cross_lists  (cost=2876.53..15182.11 rows=158372 width=4) (actual time=15.496..90.232 rows=162981 loops=1)
              Recheck Cond: (feed_id = ANY ('{311,312,313,314}'::integer[]))
              ->  Bitmap Index Scan on index_cross_lists_on_feed_id_and_cross_list_date  (cost=0.00..2836.94 rows=158372 width=0) (actual time=14.383..14.383 rows=162981 loops=1)
                    Index Cond: (feed_id = ANY ('{311,312,313,314}'::integer[]))
        ->  Hash  (cost=91670.95..91670.95 rows=48499 width=4) (actual time=76.079..76.079 rows=48853 loops=1)
              Buckets: 4096  Batches: 2  Memory Usage: 861kB
              ->  Bitmap Heap Scan on papers  (cost=1033.46..91670.95 rows=48499 width=4) (actual time=6.495..61.230 rows=48853 loops=1)
                    Recheck Cond: ((pubdate >= '2013-02-20'::date) AND (pubdate <= '2013-12-17'::date))
                    ->  Bitmap Index Scan on index_papers_on_pubdate  (cost=0.00..1021.34 rows=48499 width=0) (actual time=5.437..5.437 rows=48855 loops=1)
                          Index Cond: ((pubdate >= '2013-02-20'::date) AND (pubdate <= '2013-12-17'::date))
Total runtime: 244.295 ms

是否有可用于加速此类查询的索引，还是我需要求助于非规范化？

Answer 1

查询计划似乎很明智。从规划器所做的选择中可以看出，您已经尽可能地使用了索引，即进行位图索引扫描。

花费时间的是加入49k行和163k行。你可以做的很少，看看你的标准是如何预先聚合的。

质疑实际运行查询的合理性。我猜这是运行来计算总页数？如果是这样，在该数字发生变化之前，您是否无法缓存它？（如果没有，也许发布更多信息并解释查询试图用简单的英语实现的目标。）

Answer 2

我最终决定非常轻微地反规范化。通过将pubdate移动到cross_lists表，我得到了~10倍的加速：

[185] pry（主要）＆gt; Paper.joins（：cross_lists）.where（“cross_lists.feed_id IN（？）AND cross_lists.cross_list_date＆gt; =？AND cross_lists.cross_list_date＆lt; =？”，feed_ids，Time.now - 300.days，Time.now） .Count之间的

（22.4ms）SELECT COUNT（*）FROM“papers”INNER JOIN“cross_lists”ON“cross_lists”。“paper_id”=“papers”。“id”WHERE（cross_lists.feed_id IN（311,312,313,314）AND cross_lists.cross_list_date ＆gt; ='2013-02-20 14：33：29.034243'AND_lists.cross_list_date＆lt; ='2013-12-17 14：33：29.034443'）   =＆GT; 2830

然后我可以对此查询的结果进行分页并加入其他数据后限制，这极大地减少了大型结果集的影响。

优化Rails / Postgres多对多查询

2 个答案: