我有一个带有ORDER和LIMIT的查询来支持分页接口:
SELECT segment_members.id AS t0_r0,
segment_members.segment_id AS t0_r1,
segment_members.account_id AS t0_r2,
segment_members.score AS t0_r3,
segment_members.created_at AS t0_r4,
segment_members.updated_at AS t0_r5,
segment_members.posts_count AS t0_r6,
accounts.id AS t1_r0,
accounts.platform AS t1_r1,
accounts.username AS t1_r2,
accounts.created_at AS t1_r3,
accounts.updated_at AS t1_r4,
accounts.remote_id AS t1_r5,
accounts.name AS t1_r6,
accounts.language AS t1_r7,
accounts.description AS t1_r8,
accounts.timezone AS t1_r9,
accounts.profile_image_url AS t1_r10,
accounts.post_count AS t1_r11,
accounts.follower_count AS t1_r12,
accounts.following_count AS t1_r13,
accounts.uri AS t1_r14,
accounts.location AS t1_r15,
accounts.favorite_count AS t1_r16,
accounts.raw AS t1_r17,
accounts.followers_completed_at AS t1_r18,
accounts.followings_completed_at AS t1_r19,
accounts.followers_started_at AS t1_r20,
accounts.followings_started_at AS t1_r21,
accounts.profile_fetched_at AS t1_r22,
accounts.managed_source_id AS t1_r23
FROM segment_members
INNER JOIN accounts ON accounts.id = segment_members.account_id
WHERE segment_members.segment_id = 1
ORDER BY accounts.follower_count ASC LIMIT 20
OFFSET 0;
以下是表格中的索引:
accounts
"accounts_pkey" PRIMARY KEY, btree (id)
"index_accounts_on_remote_id_and_platform" UNIQUE, btree (remote_id, platform)
"index_accounts_on_description" btree (description)
"index_accounts_on_favorite_count" btree (favorite_count)
"index_accounts_on_follower_count" btree (follower_count)
"index_accounts_on_following_count" btree (following_count)
"index_accounts_on_lower_username_and_platform" btree (lower(username::text), platform)
"index_accounts_on_post_count" btree (post_count)
"index_accounts_on_profile_fetched_at_and_platform" btree (profile_fetched_at, platform)
"index_accounts_on_username" btree (username)
segment_members
"segment_members_pkey" PRIMARY KEY, btree (id)
"index_segment_members_on_segment_id_and_account_id" UNIQUE, btree (segment_id, account_id)
"index_segment_members_on_account_id" btree (account_id)
"index_segment_members_on_segment_id" btree (segment_id)
在我的开发和登台数据库中,查询计划如下所示,查询执行得非常快。
Limit (cost=4802.15..4802.20 rows=20 width=2086)
-> Sort (cost=4802.15..4803.20 rows=421 width=2086)
Sort Key: accounts.follower_count
-> Nested Loop (cost=20.12..4790.95 rows=421 width=2086)
-> Bitmap Heap Scan on segment_members (cost=19.69..1244.24 rows=421 width=38)
Recheck Cond: (segment_id = 1)
-> Bitmap Index Scan on index_segment_members_on_segment_id_and_account_id (cost=0.00..19.58 rows=
421 width=0)
Index Cond: (segment_id = 1)
-> Index Scan using accounts_pkey on accounts (cost=0.43..8.41 rows=1 width=2048)
Index Cond: (id = segment_members.account_id)
然而,在生产中,查询计划如下,并且查询需要永久(几分钟才能达到语句超时)。
Limit (cost=0.86..25120.72 rows=20 width=2130)
-> Nested Loop (cost=0.86..4614518.64 rows=3674 width=2130)
-> Index Scan using index_accounts_on_follower_count on accounts (cost=0.43..2779897.53 rows=3434917 width=209
2)
-> Index Scan using index_segment_members_on_segment_id_and_account_id on segment_members (cost=0.43..0.52 row
s=1 width=38)
Index Cond: ((segment_id = 1) AND (account_id = accounts.id))
accounts
在分段中有大约6米的行,在生产中有3米的行。 segment_members
在分段中有大约30万行,在生产中有4米。表大小的差异是否导致查询计划选择的差异?有什么方法可以让Postgres在生产中使用更快的查询计划吗?
更新 这是来自慢速生产服务器的EXPLAIN ANALYZE:
Limit (cost=0.86..22525.66 rows=20 width=2127) (actual time=173.148..187568.247 rows=20 loops=1)
-> Nested Loop (cost=0.86..4654749.92 rows=4133 width=2127) (actual time=173.141..187568.193 rows=20 loops=1)
-> Index Scan using index_accounts_on_follower_count on accounts (cost=0.43..2839731.81 rows=3390197 width=2089) (actual time=0.110..180374.279 rows=1401278 loops=1)
-> Index Scan using index_segment_members_on_segment_id_and_account_id on segment_members (cost=0.43..0.53 rows=1 width=38) (actual time=0.003..0.003 rows=0 loops=1401278)
Index Cond: ((segment_id = 1) AND (account_id = accounts.id))
Total runtime: 187568.318 ms
(6 rows)
答案 0 :(得分:3)
您的表统计信息不是最新的,或者您提供的两个查询非常不同。 秒估计会检索3.5M行( rows=3434917
)。 ORDER BY
/ LIMIT 20
被迫对所有350万行进行排序以找到前20名,这将是非常昂贵的 - 除非您有匹配的索引。
第一个查询计划期望排序421行。差远了。不同的查询计划也不足为奇
看到EXPLAIN ANALYZE
的输出而不只是EXPLAIN
会很有趣。 (第二次查询很贵!)
这在很大程度上取决于每个account_id
的{{1}}。如果segment_id
不具有选择性,则查询不能快。您唯一的另一个选择是MATERIALIZED VIEW
,其中前{n}行<{1}} ,并采用适当的制度来保持最新。
如果您的统计信息不是最新的,只需在两个表格上运行segment_id
即可重试
增加所选列的统计目标可能会有所帮助:
segment_id
详细说明:
除了ANALYZE
上现有的ALTER TABLE segment_members ALTER segment_id SET STATISTICS 1000;
ALTER TABLE segment_members ALTER account_id SET STATISTICS 1000;
ALTER TABLE accounts ALTER id SET STATISTICS 1000;
ALTER TABLE accounts ALTER follower_count SET STATISTICS 1000;
ANALYZE segment_members(segment_id, account_id);
ANALYZE accounts (id, follower_count);
约束UNIQUE
之外,我建议index_segment_members_on_segment_id_and_account_id
上有一个多列索引:
segment_members
再次,在创建索引后运行accounts
。
您的问题中的所有其他索引与此查询无关。它们可能对其他目的有用或无用。
这个指数是100%死货,下降了。 (Detailed explanation here.)
CREATE INDEX index_accounts_on_follower_count ON accounts (id, follower_count)
这个可能无用:
ANALYZE
由于&#34;描述&#34;通常是自由文本,几乎不用于使用合适的运算符对行进行排序或处于
条件。但这只是一个有根据的猜测。"index_segment_members_on_segment_id" btree (segment_id)