Question

SELECT DISTINCT "myapp_profile"."user_id", "myapp_profile"."name", 
  "myapp_profile"."age", "auth_user"."id", "auth_user"."username", 
  "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", 
  "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", 
  "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" 
FROM "myapp_profile" 
INNER JOIN "auth_user" ON ("myapp_profile"."user_id" = "auth_user"."id") 
LEFT OUTER JOIN "myapp_siterel" ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
LEFT OUTER JOIN "django_site" ON ("myapp_siterel"."site_id" = "django_site"."id") 
WHERE ("auth_user"."is_superuser" = false 
AND "auth_user"."is_staff" = false 
AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))) 
ORDER BY "myapp_profile"."user_id" 
DESC LIMIT 100

上述查询大约需要100秒才能运行200万用户/个人资料。我不是DBA，我们的DBA正在考虑可以做什么的情况，但是因为我可能永远不会看到有什么变化（假设它发生在数据库级别），我很好奇你如何优化这个查询。它显然需要比它发生的速度快一点，比如5秒或更短的时间。如果没有办法优化SQL，是否有一个或多个索引可以添加/更改以使查询更快，或者还有什么东西我可以忽略？

Postgres 9是数据库，Django的ORM就是这个查询的来源。

查询计划

Limit (cost=1374.35..1383.10 rows=100 width=106)
-> Unique (cost=1374.35..1391.24 rows=193 width=106)
-> Sort (cost=1374.35..1374.83 rows=193 width=106)
Sort Key: myapp_profile.user_id, myapp_profile.name, myapp_profile.age, auth_user.username, auth_user.first_name, auth_user.last_name, auth_user.email, auth_user.password, auth_user.is_staff, auth_user.is_active, auth_user.is_superuser, auth_user.last_login, auth_user.date_joined
-> Nested Loop (cost=453.99..1367.02 rows=193 width=106)
-> Hash Left Join (cost=453.99..1302.53 rows=193 width=49)
Hash Cond: (myapp_siterel.site_id = django_site.id)
Filter: ((django_site.id IS NULL) OR (django_site.id = ANY ('{10080,10053}'::integer[])))
-> Hash Left Join (cost=448.50..1053.27 rows=15001 width=53)
Hash Cond: (myapp_profile.user_id = myapp_siterel.profile_id)
-> Seq Scan on myapp_profile (cost=0.00..286.01 rows=15001 width=49)
-> Hash (cost=261.00..261.00 rows=15000 width=8)
-> Seq Scan on myapp_siterel (cost=0.00..261.00 rows=15000 width=8)
-> Hash (cost=3.55..3.55 rows=155 width=4)
-> Seq Scan on django_site (cost=0.00..3.55 rows=155 width=4)
-> Index Scan using auth_user_pkey on auth_user (cost=0.00..0.32 rows=1 width=57)
Index Cond: (auth_user.id = myapp_profile.user_id)
Filter: ((NOT auth_user.is_superuser) AND (NOT auth_user.is_staff))

由于

Answer 1

我对postgres不太熟悉，所以我不确定它的查询优化器有多好，但看起来你在where子句中所拥有的一切都可以改为加入条件，尽管我希望postgres很聪明足以为自己解决这个问题，但是如果不是这样的话，那么它会在其他3个表中获取所有200万用户的相关记录，然后使用你的位置对其进行过滤。

如果尚未存在，已经提到的索引也适用于您。我再次成为MSSQL的人，但postgres没有你能看到的任何统计资料或查询计划吗？

沿着这些方向的东西

SELECT DISTINCT
    "myapp_profile"."user_id",
    "myapp_profile"."name", 
    "myapp_profile"."age",
    "auth_user"."id",
    "auth_user"."username", 
    "auth_user"."first_name",
    "auth_user"."last_name",
    "auth_user"."email", 
    "auth_user"."password",
    "auth_user"."is_staff",
    "auth_user"."is_active", 
    "auth_user"."is_superuser",
    "auth_user"."last_login",
    "auth_user"."date_joined" 
FROM "myapp_profile" 
    INNER JOIN "auth_user"
        ON ("myapp_profile"."user_id" = "auth_user"."id") 
        AND "auth_user"."is_superuser" = false
        AND "auth_user"."is_staff" = false 
    LEFT OUTER JOIN "myapp_siterel"
        ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
    LEFT OUTER JOIN "django_site"
        ON ("myapp_siterel"."site_id" = "django_site"."id") 
        AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))
ORDER BY "myapp_profile"."user_id" DESC
LIMIT 100

另外，你需要与众不同吗？这也会减慢它的速度。

Answer 2

基础知识：

确保所有用户ID字段都已编入索引。

看起来你也可以使用is_supervisor和is_staff

上的索引

Answer 3

从来没有一个直接的银弹解决方案用于查询优化，但是，明显的步骤是索引您正在搜索的列，在您的情况下，这是：

"auth_user"."is_superuser"
"auth_user"."is_staff"
"django_site"."id"
"myapp_profile"."user_id"

有关优化慢查询的问题（包括SQL）

查询计划

3 个答案: