Postgres在生产中选择次优查询计划

时间:2014-09-17 20:31:03

标签: postgresql indexing

我有一个带有ORDER和LIMIT的查询来支持分页接口:

SELECT segment_members.id AS t0_r0,
       segment_members.segment_id AS t0_r1,
       segment_members.account_id AS t0_r2,
       segment_members.score AS t0_r3,
       segment_members.created_at AS t0_r4,
       segment_members.updated_at AS t0_r5,
       segment_members.posts_count AS t0_r6,
       accounts.id AS t1_r0,
       accounts.platform AS t1_r1,
       accounts.username AS t1_r2,
       accounts.created_at AS t1_r3,
       accounts.updated_at AS t1_r4,
       accounts.remote_id AS t1_r5,
       accounts.name AS t1_r6,
       accounts.language AS t1_r7,
       accounts.description AS t1_r8,
       accounts.timezone AS t1_r9,
       accounts.profile_image_url AS t1_r10,
       accounts.post_count AS t1_r11,
       accounts.follower_count AS t1_r12,
       accounts.following_count AS t1_r13,
       accounts.uri AS t1_r14,
       accounts.location AS t1_r15,
       accounts.favorite_count AS t1_r16,
       accounts.raw AS t1_r17,
       accounts.followers_completed_at AS t1_r18,
       accounts.followings_completed_at AS t1_r19,
       accounts.followers_started_at AS t1_r20,
       accounts.followings_started_at AS t1_r21,
       accounts.profile_fetched_at AS t1_r22,
       accounts.managed_source_id AS t1_r23
FROM segment_members
INNER JOIN accounts ON accounts.id = segment_members.account_id
WHERE segment_members.segment_id = 1
ORDER BY accounts.follower_count ASC LIMIT 20
OFFSET 0;

以下是表格中的索引:

accounts
"accounts_pkey" PRIMARY KEY, btree (id)
"index_accounts_on_remote_id_and_platform" UNIQUE, btree (remote_id, platform)
"index_accounts_on_description" btree (description)
"index_accounts_on_favorite_count" btree (favorite_count)
"index_accounts_on_follower_count" btree (follower_count)
"index_accounts_on_following_count" btree (following_count)
"index_accounts_on_lower_username_and_platform" btree (lower(username::text), platform)
"index_accounts_on_post_count" btree (post_count)
"index_accounts_on_profile_fetched_at_and_platform" btree (profile_fetched_at, platform)
"index_accounts_on_username" btree (username)

segment_members
"segment_members_pkey" PRIMARY KEY, btree (id)
"index_segment_members_on_segment_id_and_account_id" UNIQUE, btree (segment_id, account_id)
"index_segment_members_on_account_id" btree (account_id)
"index_segment_members_on_segment_id" btree (segment_id)

在我的开发和登台数据库中,查询计划如下所示,查询执行得非常快。

 Limit  (cost=4802.15..4802.20 rows=20 width=2086)
   ->  Sort  (cost=4802.15..4803.20 rows=421 width=2086)
         Sort Key: accounts.follower_count
         ->  Nested Loop  (cost=20.12..4790.95 rows=421 width=2086)
               ->  Bitmap Heap Scan on segment_members  (cost=19.69..1244.24 rows=421 width=38)
                     Recheck Cond: (segment_id = 1)
                     ->  Bitmap Index Scan on index_segment_members_on_segment_id_and_account_id  (cost=0.00..19.58 rows=
421 width=0)
                           Index Cond: (segment_id = 1)
               ->  Index Scan using accounts_pkey on accounts  (cost=0.43..8.41 rows=1 width=2048)
                     Index Cond: (id = segment_members.account_id)

然而,在生产中,查询计划如下,并且查询需要永久(几分钟才能达到语句超时)。

 Limit  (cost=0.86..25120.72 rows=20 width=2130)
   ->  Nested Loop  (cost=0.86..4614518.64 rows=3674 width=2130)
         ->  Index Scan using index_accounts_on_follower_count on accounts  (cost=0.43..2779897.53 rows=3434917 width=209
2)
         ->  Index Scan using index_segment_members_on_segment_id_and_account_id on segment_members  (cost=0.43..0.52 row
s=1 width=38)
               Index Cond: ((segment_id = 1) AND (account_id = accounts.id))

accounts在分段中有大约6米的行,在生产中有3米的行。 segment_members在分段中有大约30万行,在生产中有4米。表大小的差异是否导致查询计划选择的差异?有什么方法可以让Postgres在生产中使用更快的查询计划吗?

更新 这是来自慢速生产服务器的EXPLAIN ANALYZE:

 Limit  (cost=0.86..22525.66 rows=20 width=2127) (actual time=173.148..187568.247 rows=20 loops=1)
   ->  Nested Loop  (cost=0.86..4654749.92 rows=4133 width=2127) (actual time=173.141..187568.193 rows=20 loops=1)
         ->  Index Scan using index_accounts_on_follower_count on accounts  (cost=0.43..2839731.81 rows=3390197 width=2089) (actual time=0.110..180374.279 rows=1401278 loops=1)
         ->  Index Scan using index_segment_members_on_segment_id_and_account_id on segment_members  (cost=0.43..0.53 rows=1 width=38) (actual time=0.003..0.003 rows=0 loops=1401278)
               Index Cond: ((segment_id = 1) AND (account_id = accounts.id))
 Total runtime: 187568.318 ms
(6 rows)

1 个答案:

答案 0 :(得分:3)

您的表统计信息不是最新的,或者您提供的两个查询非常不同估计会检索3.5M行( rows=3434917 )。 ORDER BY / LIMIT 20被迫对所有350万行进行排序以找到前20名,这将是非常昂贵的 - 除非您有匹配的索引。
第一个查询计划期望排序421行。差远了。不同的查询计划也不足为奇 看到EXPLAIN ANALYZE的输出而不只是EXPLAIN会很有趣。 (第二次查询很贵!)

这在很大程度上取决于每个account_id的{​​{1}}。如果segment_id不具有选择性,则查询不能快。您唯一的另一个选择是MATERIALIZED VIEW,其中前{n}行<{1}} ,并采用适当的制度来保持最新。

如果您的统计信息不是最新的,只需在两个表格上运行segment_id即可重试 增加所选列的统计目标可能会有所帮助:

segment_id

详细说明:

更好的索引

除了ANALYZE上现有的ALTER TABLE segment_members ALTER segment_id SET STATISTICS 1000; ALTER TABLE segment_members ALTER account_id SET STATISTICS 1000; ALTER TABLE accounts ALTER id SET STATISTICS 1000; ALTER TABLE accounts ALTER follower_count SET STATISTICS 1000; ANALYZE segment_members(segment_id, account_id); ANALYZE accounts (id, follower_count); 约束UNIQUE之外,我建议index_segment_members_on_segment_id_and_account_id上有一个多列索引:

segment_members

再次,在创建索引后运行accounts

有些索引没用吗?

您的问题中的所有其他索引与此查询无关。它们可能对其他目的有用或无用。

这个指数是100%死货,下降了。 (Detailed explanation here.)

CREATE INDEX index_accounts_on_follower_count ON accounts (id, follower_count)

这个可能无用:

ANALYZE

由于&#34;描述&#34;通常是自由文本,几乎不用于使用合适的运算符对行进行排序或处于"index_segment_members_on_segment_id" btree (segment_id)条件。但这只是一个有根据的猜测。