如何优化WHERE条件包括检查user_id = X OR user_id IN的查询(某些子查询可能不返回任何结果)
在我的下面的示例中,查询1和2都非常快(<1 ms),但查询3(它只是查询1和2中条件的OR)要慢得多(50 ms)
有人可以解释为什么查询3这么慢,一般来说我应该采用什么类型的查询优化策略来避免这个问题?我意识到我的例子中的子查询很容易被消除,但在现实生活中,有时子查询似乎是获取我想要的数据最简单的方法。
相关代码和数据:
发布数据 https://dl.dropbox.com/u/4597000/StackOverflow/sanitized_posts.csv
用户数据 https://dl.dropbox.com/u/4597000/StackOverflow/sanitized_users.csv
# from the shell:
# > createdb test
CREATE TABLE posts (
id integer PRIMARY KEY NOT NULL,
created_by_id integer NOT NULL,
created_at integer NOT NULL
);
CREATE INDEX index_posts ON posts (created_by_id, created_at);
CREATE INDEX index_posts_2 ON posts (created_at);
CREATE TABLE users (
id integer PRIMARY KEY NOT NULL,
login varchar(50) NOT NULL
);
CREATE INDEX index_users ON users (login);
COPY posts FROM '/path/to/sanitized_posts.csv' DELIMITERS ',' CSV;
COPY users FROM '/path/to/sanitized_users.csv' DELIMITERS ',' CSV;
-- queries:
-- query 1, fast:
EXPLAIN ANALYZE SELECT * FROM posts WHERE created_by_id = 123 LIMIT 100;
-- query 2, fast:
EXPLAIN ANALYZE SELECT * FROM posts WHERE created_by_id IN (SELECT id FROM users WHERE login = 'nobodyhasthislogin') LIMIT 100;
-- query 3, slow:
EXPLAIN ANALYZE SELECT * FROM posts WHERE created_by_id = 123 OR created_by_id IN (SELECT id FROM users WHERE login = 'nobodyhasthislogin') LIMIT 100;
答案 0 :(得分:1)
拆分查询(已编辑):
SELECT * FROM (
SELECT * FROM posts p WHERE p.created_by_id = 123
union
SELECT * FROM posts p
WHERE
EXISTS ( SELECT TRUE FROM users WHERE id = p.created_by_id AND login = 'nobodyhasthislogin')
) p
LIMIT 100;
答案 1 :(得分:0)
此特定查询中的大多数时间与索引扫描有关。这是一个从不同角度来看的问题,以避免这种情况,但应该返回相同的结果。
SELECT posts.* FROM users JOIN posts on posts.created_by_id=users.id WHERE users.id=123 or login='nobodyhasthislogin'
这将从users表中选择,执行一次过滤,然后将帖子加入到该表中。
我意识到问题是关于优化的提示,而不是这个特定的查询。要回答这个问题,我的建议是运行EXPLAIN ANALYZE
并阅读解释结果, - this回答对我有帮助。
答案 2 :(得分:0)
怎么样:
EXPLAIN ANALYZE
SELECT *
FROM posts
WHERE created_by_id IN (
SELECT 123
UNION ALL
SELECT id FROM
users WHERE
login = 'nobodyhasthislogin') LIMIT 100;