我正在尝试使用django ORM的聚合功能在MSSQL 2008R2数据库上运行查询,但我一直收到超时错误。失败的查询(由django生成)如下。我试过运行它指导SQL管理工作室,它可以工作,但需要3.5分钟
它确实看起来它聚集在一堆它不需要的字段上,但我不会这样做,但这应该真的导致它花费那么长时间。数据库也不是很大,auth_user
有9条记录,ticket_ticket
有1210条,而ticket_watchers
有1876条。我有什么遗漏吗?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
修改
以下是相关索引(不包括查询中未使用的索引):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
编辑2:
经过一些实验,我发现以下查询是导致执行缓慢的最小查询:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
奇怪的是,如果我在上面注释掉任何两行,它的运行时间少于1s,但是我删除哪一行似乎并不重要(尽管显然我不能删除连接删除相关的SELECT行。)
编辑3:
生成它的python代码是:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
查看执行计划显示SQL Server首先在所有表上进行交叉连接,从而产生大约2.8亿行和6Gb数据。我认为这就是问题所在,但为什么会发生呢?
答案 0 :(得分:1)
SQL Server正在完成它的要求。不幸的是,Django没有为你想要的东西生成正确的查询。您似乎需要统计数字,而不仅仅是数:Django annotate() multiple times causes wrong answers
为什么查询以这种方式工作:查询说要将四个表连接在一起。因此,如果作者有2张获得的门票,3张分配的门票和4张观看的门票,则该连接将返回2 * 3 * 4张门票,每张门票组合一张。不同的部分将删除所有重复项。
答案 1 :(得分:0)
这是怎么回事?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)