Question

我有一个大表，其中有5亿多行。我试图找到最佳的索引替代方法，以加快查询时间。我想根据时间戳进行排序会大大降低查询时间。该表中有15列。

MyTable与other_table具有@ManyToOne关系。用户也可以定义最大结果。代码如下：

// Im showing the query itself here instead of the name of @NamedQuery inside the entity class.
TypedQuery<MyTable> query = em.createNamedQuery("SELECT m FROM my_table m WHERE m.other_table.id = :id AND m.city in :cities ORDER BY m.timestamp DESC", MyTable.class);
query.setParameter("id", id);
query.setParameter("cities", cities);
query.setMaxResults(number);
return query.getResultList();

这种查询的最佳替代方法是什么？综合指数？在这种情况下，哪种索引类型最合适？

我们有一个这样的索引，但是正如我所说，这需要很长时间。

CREATE INDEX my_table_idx ON my_schema.my_table USING btree (other_table_id, timestamp DESC NULLS LAST, city)

编辑：

这是执行计划：

Limit  (cost=2876886.98..2876887.03 rows=20 width=162) (actual time=101820.279..101820.284 rows=20 loops=1)
  Buffers: shared hit=8063 read=635649 written=12198
  ->  Sort  (cost=2876886.98..2879114.34 rows=890941 width=162) (actual time=101820.277..101820.278 rows=20 loops=1)
        Sort Key: timestamp DESC
        Sort Method: top-N heapsort  Memory: 35kB
        Buffers: shared hit=8063 read=635649 written=12198
  ->  Bitmap Heap Scan on my_table  (cost=31640.64..2853179.36 rows=890941 width=162) (actual time=199.824..101221.260 rows=711774 loops=1)
        Recheck Cond: ((m_other_table_id = '14b713d5-fb1a-4dbd-c013-fat4a7f6c8e3'::uuid) AND (m_city_id = 3))
        Rows Removed by Index Recheck: 28920837
        Heap Blocks: exact=23535 lossy=615808
        Buffers: shared hit=8060 read=635649 written=12198
        ->  Bitmap Index Scan on my_table_idx  (cost=0.00..31417.90 rows=890941 width=0) (actual time=189.011..189.012 rows=711777 loops=1)
              Index Cond: ((m_other_table_id = '14b713d5-fb1a-4dbd-c013-fat4a7f6c8e3'::uuid) AND (m_city_id = 3))
              Buffers: shared hit=90 read=4276
Planning time: 0.198 ms
Execution time: 101821.109 ms

这些是我们拥有的索引。

CREATE INDEX my_table_idx ON my_schema.my_table USING btree (other_table_id, timestamp DESC NULLS LAST, city)
CREATE UNIQUE INDEX my_table_prev_id_idx ON my_schema.my_table USING btree (m_prev_id)
CREATE INDEX my_table_other_table_fk_idx ON my_schema.my_table USING btree (m_other_table_id)
CREATE UNIQUE INDEX my_table_pkey ON my_schema.my_table USING btree (m_id)
CREATE INDEX my_table_track_fk_idx ON my_schema.my_table USING btree (m_track_id)

编辑2：

我想知道为什么并行工作人员未在我的执行计划中显示。我已经配置了这些设置。

max_worker_processes = 6;
max_parallel_workers = 6;
max_parallel_workers_per_gather = 3;

Answer 1

根据计划中的这一行：

Cond: ((m_other_table_id = '14b713d5-fb1a-4dbd-c013-fat4a7f6c8e3'::uuid) AND (m_city_id = 3))

您的理想索引应该在(m_other_table_id, m_city_id)上。不是other_table_id，不是city。您显示的计划与查询不完全匹配，因此很难分辨出错别字在所显示的计划或所显示的查询中。

此外，由于您按timestamp进行订购，因此我们可以将其添加到索引中。

因此，我将尝试以下索引：

CREATE INDEX idx ON my_schema.my_table USING btree 
    (m_other_table_id, m_city_id, timestamp DESC)

这里的列顺序很重要。

Answer 2

你需要检索 1/20 亿的元组吗？我怀疑。

也许问题是：你打算在你的程序中处理这个结果来做其他事情吗？也许您可以将这项工作推送到 DBMS，这样您就可以只收到您需要的元组。

关于您的查询。我认为问题在于您按一个属性进行连接，然后按另一个属性进行排序。

编辑：我看了你的解释。解释有限制。您的查询没有。

所以您要加入每个单独的元组以查找具有最大时间戳的元组？

执行以下操作：

使用子查询编写查询，在其中检索不同的时间戳，对它们进行排序并保留 N 个最新的。

然后搜索具有此时间戳的连接元组。像这样：

select * from a join b on (whetever join condition) where timestamp IN (select distinct timestamp from ... order by timestamp desc limit 100)

PostgreSQL查询需要5分钟以上

2 个答案: