我有以下两个查询。查询1很快,因为它使用索引(使用嵌套循环连接),而查询2使用散列连接,速度较慢。
查询1按表1列排序,查询2按表2列排序。
查询1
learning=# explain analyze
select *
from users left join
access_logs
on users.userid = access_logs.userid
order by users.userid
limit 10 offset 90;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=14.00..15.46 rows=10 width=104) (actual time=1.330..1.504 rows=10 loops=1)
-> Merge Left Join (cost=0.85..291532.97 rows=1995958 width=104) (actual time=0.037..1.482 rows=100 loops=1)
Merge Cond: (users.userid = access_logs.userid)
-> Index Scan using users_pkey on users (cost=0.43..151132.75 rows=1995958 width=76) (actual time=0.018..1.135 rows=100 loops=1)
-> Index Scan using access_logs_userid_idx on access_logs (cost=0.43..110471.45 rows=1995958 width=28) (actual time=0.012..0.198 rows=100 loops=1)
Planning time: 0.469 ms
Execution time: 1.569 ms
查询2
learning=# explain analyze
select *
from users left join
access_logs
on users.userid = access_logs.userid
order by access_logs.userid
limit 10 offset 90;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=293584.20..293584.23 rows=10 width=104) (actual time=3821.432..3821.439 rows=10 loops=1)
-> Sort (cost=293583.98..298573.87 rows=1995958 width=104) (actual time=3821.391..3821.415 rows=100 loops=1)
Sort Key: access_logs.userid
Sort Method: top-N heapsort Memory: 51kB
-> Hash Left Join (cost=73231.06..217299.90 rows=1995958 width=104) (actual time=539.859..3168.754 rows=1995958 loops=1)
Hash Cond: (users.userid = access_logs.userid)
-> Seq Scan on users (cost=0.00..44814.58 rows=1995958 width=76) (actual time=0.009..443.260 rows=1995958 loops=1)
-> Hash (cost=34636.58..34636.58 rows=1995958 width=28) (actual time=539.112..539.112 rows=1995958 loops=1)
Buckets: 262144 Batches: 2 Memory Usage: 58532kB
-> Seq Scan on access_logs (cost=0.00..34636.58 rows=1995958 width=28) (actual time=0.006..170.061 rows=1995958 loops=1)
Planning time: 0.480 ms
Execution time: 3832.245 ms
问题
查询 - 解析分析select * from access_logs顺序by userid limit 10 offset 90;
计划
Limit (cost=5.41..5.96 rows=10 width=28) (actual time=0.199..0.218 rows=10 loops=1)
-> Index Scan using access_logs_userid_idx on access_logs (cost=0.43..110471.45 rows=1995958 width=28) (actual time=0.029..0.201 rows=100 loops=1)
Planning time: 0.120 ms
Execution time: 0.252 ms
修改1 :
我的目标不是比较两个查询,实际上我想要查询2中的结果,我只提供了查询1,以便我可以比较理解。
订单依据不限于连接列,用户也可以通过表2中的其他列进行订购,计划如下。
learning=# explain analyze select * from users left join access_logs on users.userid=access_logs.userid order by access_logs.last_login limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=260431.83..260431.86 rows=10 width=104) (actual time=3846.625..3846.627 rows=10 loops=1)
-> Sort (cost=260431.83..265421.73 rows=1995958 width=104) (actual time=3846.623..3846.623 rows=10 loops=1)
Sort Key: access_logs.last_login
Sort Method: top-N heapsort Memory: 27kB
-> Hash Left Join (cost=73231.06..217299.90 rows=1995958 width=104) (actual time=567.104..3174.818 rows=1995958 loops=1)
Hash Cond: (users.userid = access_logs.userid)
-> Seq Scan on users (cost=0.00..44814.58 rows=1995958 width=76) (actual time=0.007..443.364 rows=1995958 loops=1)
-> Hash (cost=34636.58..34636.58 rows=1995958 width=28) (actual time=566.814..566.814 rows=1995958 loops=1)
Buckets: 262144 Batches: 2 Memory Usage: 58532kB
-> Seq Scan on access_logs (cost=0.00..34636.58 rows=1995958 width=28) (actual time=0.004..169.137 rows=1995958 loops=1)
Planning time: 0.490 ms
Execution time: 3857.171 ms
答案 0 :(得分:2)
第二个查询中的排序不会使用索引,因为不保证索引将所有值都排序。如果ib_logfile1
中的某些记录与users
不匹配,那么access_logs
会生成Left Join
在查询中引用的null
值access_logs.userid
,但实际上不会出现在access_logs
中因而没有被索引覆盖。
解决方法是在access_log
为每个用户创建默认初始记录,并使用Inner Join
。