我们有3张桌子。
一行10,000行,第二行80,000行,第三行400行。
代码运行良好,但最近我们遇到了性能问题。
EXPLAIN ANALYZE SELECT "users_users"."id", "users_users"."email"
FROM "users_users" WHERE (NOT ("users_users"."email" IN
(SELECT U0."email" FROM "users_blacklist" U0))
AND NOT ("users_users"."id" IN (SELECT U0."user_id"
FROM "games_user2game" U0))) ORDER BY "users_users"."id" DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan Backward using users_user_pkey on users_users (cost=9.25..12534132.45 rows=2558 width=26) (actual time=46.101..77158.318 rows=2510 loops=1)
Filter: ((NOT (hashed SubPlan 1)) AND (NOT (SubPlan 2)))
Rows Removed by Filter: 7723
SubPlan 1
-> Seq Scan on users_blacklist u0 (cost=0.00..8.20 rows=420 width=22) (actual time=0.032..0.318 rows=420 loops=1)
SubPlan 2
-> Materialize (cost=0.00..2256.20 rows=77213 width=4) (actual time=0.003..4.042 rows=35774 loops=9946)
-> Seq Scan on games_user2game u0 (cost=0.00..1568.13 rows=77213 width=4) (actual time=0.011..17.159 rows=77213 loops=1)
Total runtime: 77159.689 ms
(9 rows)
主要问题:是否可以,我们在加入少于100,000行的2个表时遇到性能问题?
在哪里挖?我们应该更改查询还是挖掘数据库设置?
UPD 临时解决方案是通过在代码中预取子查询来消除子查询。
答案 0 :(得分:1)
我不知道SQL的postgres dialet,但是可能值得尝试使用外连接。在许多其他dbms中,它们可以提供比子选择更好的性能。
的内容
SELECT "users_users"."id", "users_users"."email"
FROM "users_users" us left join "users_blacklist" uo on uo.email = us.email
left join "games_user2game" ug on us.id = ug.user_id
where uo.email is null
AND ug.id is null
我认为与您的原始查询做同样的事情,但您必须进行测试才能确定。
答案 1 :(得分:1)
我在SQL Server上遇到过类似的问题,并使用exists重写了查询,因为@Scotch建议效果良好。
SELECT
"users_users"."id",
"users_users"."email"
FROM "users_users"
WHERE
NOT EXISTS
(
SELECT NULL FROM "users_blacklist" WHERE "users_blacklist"."email" = "users_users"."email"
)
AND NOT EXISTS
(
SELECT NULL FROM "games_user2game" WHERE "games_user2game"."user_id" = "users_users"."user_id"
)
ORDER BY "users_users"."id" DESC;
此查询将为您提供未列入黑名单且未参加游戏的所有用户。它可能比外连接选项更快,具体取决于postgres如何规划查询。