我有这个查询,而且非常慢:
SELECT a.id,
COALESCE(uas.read, CAST(0 AS BOOLEAN)) as read,
COALESCE(at.link, '') as thumbnail_link
FROM users_feeds uf INNER JOIN articles a
ON uf.feed_id = a.feed_id
LEFT OUTER JOIN users_articles_states uas
ON a.id = uas.article_id AND uf.user_login = uas.user_login
LEFT OUTER JOIN articles_thumbnails at
ON a.id = at.article_id
WHERE uf.user_login = 'test1'
ORDER BY uas.read, a.date DESC LIMIT 50 OFFSET 0;
使用我当前的数据集平均需要500毫秒。两个最大的表格是'文章'和' users_articles_states',两者各持有大约100000条记录。
如果我放弃“uas.read'从ORDER BY开始,查询大约需要2ms。阅读'和' date'这两个表中的列都有索引(我想这可以解释为什么只按日期排序时速度如此之快)
缓慢执行的查询计划如下(吸尘后):
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=27944.57..27944.69 rows=50 width=104) (actual time=321.465..321.471 rows=50 loops=1)
-> Sort (cost=27944.57..28218.93 rows=109747 width=104) (actual time=321.464..321.465 rows=50 loops=1)
Sort Key: uas.read, a.date
Sort Method: top-N heapsort Memory: 34kB
-> Hash Left Join (cost=3863.32..24298.85 rows=109747 width=104) (actual time=45.736..292.656 rows=92297 loops=1)
Hash Cond: (a.id = at.article_id)
-> Hash Left Join (cost=3668.47..23088.83 rows=109747 width=17) (actual time=44.205..235.573 rows=92297 loops=1)
Hash Cond: ((uf.user_login = uas.user_login) AND (a.id = uas.article_id))
-> Hash Join (cost=1.57..14331.50 rows=109747 width=24) (actual time=0.019..73.701 rows=92297 loops=1)
Hash Cond: (a.feed_id = uf.feed_id)
-> Seq Scan on articles a (cost=0.00..12757.64 rows=94964 width=20) (actual time=0.003..34.462 rows=93916 loops=1)
-> Hash (cost=1.31..1.31 rows=21 width=12) (actual time=0.011..0.011 rows=21 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on users_feeds uf (cost=0.00..1.31 rows=21 width=12) (actual time=0.003..0.009 rows=21 loops=1)
Filter: (user_login = 'test1'::text)
Rows Removed by Filter: 4
-> Hash (cost=1741.65..1741.65 rows=92283 width=17) (actual time=44.170..44.170 rows=92282 loops=1)
Buckets: 2048 Batches: 8 Memory Usage: 639kB
-> Seq Scan on users_articles_states uas (cost=0.00..1741.65 rows=92283 width=17) (actual time=0.005..24.293 rows=92282 loops=1)
Filter: (user_login = 'test1'::text)
Rows Removed by Filter: 10
-> Hash (cost=135.49..135.49 rows=4749 width=95) (actual time=1.520..1.520 rows=4733 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 606kB
-> Seq Scan on articles_thumbnails at (cost=0.00..135.49 rows=4749 width=95) (actual time=0.004..0.765 rows=4733 loops=1)
'阅读'是:" users_articles_states_read_idx" btree(阅读)
我猜测psql无法使用此索引。是否有我可以创建的其他索引以便相对快速地获取内容,或者我可以通过任何其他方式更改查询本身来安抚数据库?
编辑1:我错误地发布了原始查询并显示错误(在' uas'表格中的INNER JOIN)
表定义:
readeef=> \d users_articles_states
Table "public.users_articles_states"
Column | Type | Modifiers
------------+---------+---------------
user_login | text | not null
article_id | bigint | not null
read | boolean | default false
favorite | boolean | default false
Indexes:
"users_articles_states_pkey" PRIMARY KEY, btree (user_login, article_id)
"users_articles_states_read_idx" btree (read)
Foreign-key constraints:
"users_articles_states_article_id_fkey" FOREIGN KEY (article_id) REFERENCES articles(id) ON DELETE CASCADE
"users_articles_states_user_login_fkey" FOREIGN KEY (user_login) REFERENCES users(login) ON DELETE CASCADE
readeef=> \d articles
Table "public.articles"
Column | Type | Modifiers
-------------+--------------------------+-------------------------------------------------------
id | bigint | not null default nextval('articles_id_seq'::regclass)
feed_id | integer |
link | text |
title | text |
description | text |
date | timestamp with time zone |
guid | text |
Indexes:
"articles_pkey" PRIMARY KEY, btree (id)
"articles_feed_id_guid_key" UNIQUE CONSTRAINT, btree (feed_id, guid)
"articles_feed_id_link_key" UNIQUE CONSTRAINT, btree (feed_id, link)
"articles_date_idx" btree (date)
Foreign-key constraints:
"articles_feed_id_fkey" FOREIGN KEY (feed_id) REFERENCES feeds(id) ON DELETE CASCADE
Referenced by:
TABLE "articles_extracts" CONSTRAINT "articles_extracts_article_id_fkey" FOREIGN KEY (article_id) REFERENCES articles(id) ON DELETE CASCADE
TABLE "articles_scores" CONSTRAINT "articles_scores_article_id_fkey" FOREIGN KEY (article_id) REFERENCES articles(id) ON DELETE CASCADE
TABLE "articles_thumbnails" CONSTRAINT "articles_thumbnails_article_id_fkey" FOREIGN KEY (article_id) REFERENCES articles(id) ON DELETE CASCADE
TABLE "users_articles_states" CONSTRAINT "users_articles_states_article_id_fkey" FOREIGN KEY (article_id) REFERENCES articles(id) ON DELETE CASCADE
编辑2: 将索引添加到' user_login'不会删除Seq Scan,可能是因为只有少数用户'在数据库中。
编辑3:忘了提,psql版本是9.3.9
编辑4:我尝试了一些不同的东西。我删除了“uas.read”#39;从ORDER BY子句中添加"和uas.read =' t'"到哪一个。根据规划者,执行时间为0.4ms。将后来更改为"和uas.read =' f'",执行时间跳到622ms。两个执行计划之间几乎没有区别,除了成本,过滤器(一个未读取,另一个读取),以及通过连接过滤器删除的':
QUERY PLAN (slow query)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.99..2374.08 rows=50 width=103) (actual time=0.064..671.332 rows=2 loops=1)
-> Nested Loop Left Join (cost=0.99..50263.02 rows=1059 width=103) (actual time=0.063..671.330 rows=2 loops=1)
-> Nested Loop (cost=0.71..49930.39 rows=1059 width=17) (actual time=0.057..671.319 rows=2 loops=1)
-> Nested Loop (cost=0.29..47995.18 rows=3744 width=48) (actual time=0.039..363.354 rows=92052 loops=1)
Join Filter: (uf.feed_id = a.feed_id)
Rows Removed by Join Filter: 1873485 (1207 in fast one)
-> Index Scan Backward using articles_date_idx on articles a (cost=0.29..46589.91 rows=93597 width=20) (actual time=0.011..58.529 rows=93597 loops=1) (rows=60 in fast one)
-> Materialize (cost=0.00..1.32 rows=1 width=36) (actual time=0.000..0.001 rows=21 loops=93597) (loops=60 in fast one)
-> Seq Scan on users_feeds uf (cost=0.00..1.31 rows=1 width=36) (actual time=0.006..0.013 rows=21 loops=1)
Filter: (user_login = 'test1'::text)
Rows Removed by Filter: 4
-> Index Scan using users_articles_states_pkey on users_articles_states uas (cost=0.42..0.51 rows=1 width=17) (actual time=0.003..0.003 rows=0 loops=92052)
Index Cond: ((user_login = 'test1'::text) AND (article_id = a.id))
Filter: (NOT read) (read in fast one)
Rows Removed by Filter: 1
-> Index Scan using articles_thumbnails_pkey on articles_thumbnails at (cost=0.28..0.30 rows=1 width=94) (actual time=0.002..0.004 rows=1 loops=2)
Index Cond: (a.id = article_id)
在sqlite3中使用相同的数据和类似方案进行测试后,在使用' uas.read'进行排序时速度很慢,但在WHERE子句中对它进行过滤没有问题。它的执行时间是相同的~0.5ms,无论其是否为'而不是uas.read'或者'和uas.read'