超时运行SQL查询

时间:2013-06-28 06:16:20

标签: sql sql-server django sql-server-2008

我正在尝试使用django ORM的聚合功能在MSSQL 2008R2数据库上运行查询,但我一直收到超时错误。失败的查询(由django生成)如下。我试过运行它指导SQL管理工作室,它可以工作,但需要3.5分钟

它确实看起来它聚集在一堆它不需要的字段上,但我不会这样做,但这应该真的导致它花费那么长时间。数据库也不是很大,auth_user有9条记录,ticket_ticket有1210条,而ticket_watchers有1876条。我有什么遗漏吗?

SELECT 
    [auth_user].[id], 
    [auth_user].[password], 
    [auth_user].[last_login], 
    [auth_user].[is_superuser], 
    [auth_user].[username], 
    [auth_user].[first_name], 
    [auth_user].[last_name], 
    [auth_user].[email], 
    [auth_user].[is_staff], 
    [auth_user].[is_active], 
    [auth_user].[date_joined], 
    COUNT([tickets_ticket].[id]) AS [tickets_captured__count], 
    COUNT(T3.[id]) AS [assigned_tickets__count], 
    COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count] 
FROM 
    [auth_user] 
    LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id]) 
    LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id]) 
    LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id]) 
GROUP BY 
    [auth_user].[id], 
    [auth_user].[password], 
    [auth_user].[last_login], 
    [auth_user].[is_superuser], 
    [auth_user].[username], 
    [auth_user].[first_name], 
    [auth_user].[last_name], 
    [auth_user].[email], 
    [auth_user].[is_staff], 
    [auth_user].[is_active], 
    [auth_user].[date_joined] 
HAVING 
    (COUNT([tickets_ticket].[id]) > 0  OR COUNT(T3.[id]) > 0 )

修改

以下是相关索引(不包括查询中未使用的索引):

auth_user.id                       (PK)
auth_user.username                 (Unique)
tickets_ticket.id                  (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id         (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id

编辑2:

经过一些实验,我发现以下查询是导致执行缓慢的最小查询:

SELECT 
    COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
    COUNT(T3.[id]) AS [assigned_tickets__count],
    COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM 
    [auth_user] 
    LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id]) 
    LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id]) 
    LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id]) 
GROUP BY 
    [auth_user].[id]

奇怪的是,如果我在上面注释掉任何两行,它的运行时间少于1s,但是我删除哪一行似乎并不重要(尽管显然我不能删除连接删除相关的SELECT行。)

编辑3:

生成它的python代码是:

User.objects.annotate(
    Count('tickets_captured'), 
    Count('assigned_tickets'), 
    Count('tickets_watched')
)

查看执行计划显示SQL Server首先在所有表上进行交叉连接,从而产生大约2.8亿行和6Gb数据。我认为这就是问题所在,但为什么会发生呢?

2 个答案:

答案 0 :(得分:1)

SQL Server正在完成它的要求。不幸的是,Django没有为你想要的东西生成正确的查询。您似乎需要统计数字,而不仅仅是数:Django annotate() multiple times causes wrong answers

为什么查询以这种方式工作:查询说要将四个表连接在一起。因此,如果作者有2张获得的门票,3张分配的门票和4张观看的门票,则该连接将返回2 * 3 * 4张门票,每张门票组合一张。不同的部分将删除所有重复项。

答案 1 :(得分:0)

这是怎么回事?

SELECT auth_user.*, 
   C1.tickets_captured__count
   C2.assigned_tickets__count
   C3.tickets_watched__count

FROM 
auth_user
LEFT JOIN
( SELECT  capturer_id, COUNT(*) AS tickets_captured__count 
  FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT  responsible_id, COUNT(*) AS assigned_tickets__count 
  FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT  user_id, COUNT(*) AS tickets_watched__count 
  FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id

WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null   -- also works (I think with beter performance)