在where子句中查询查询性能问题

时间:2012-03-26 12:24:06

标签: mysql sql performance database-performance

由于sql子句中的内部查询,我跟踪了一些复杂的where查询,该查询具有可怕的性能,“当然”。在某些情况下,它需要一分钟。有人知道如何重写这个查询以获得更好的性能吗?
查询:

SELECT DISTINCT t.id as taskId, t.name as taskName, 
  t.startdate as taskStartDate, t.enddate as taskEndDate, 
  t.proj_id as taskProjectId 
FROM PROJECT p, EMPL_PROJ ep, TASK t, TIMERECORD tr 
WHERE 
  ep.empl_id = ? AND 
  ep.proj_id = p.id AND 
  ep.proj_id = t.proj_id AND 
  ((p.startdate IS NULL AND p.enddate IS NULL) OR 
   (p.startdate IS NULL AND p.enddate >= ?) OR 
   (p.enddate IS NULL AND p.startdate <= ? + INTERVAL 6 DAY) OR 
   (p.startdate <= ? + INTERVAL 6 DAY AND p.enddate >= ?) ) AND 
  ((t.startdate IS NULL AND t.enddate IS NULL) OR 
   (t.startdate IS NULL AND t.enddate >= ?) OR 
   (t.enddate IS NULL AND t.startdate <= ? + INTERVAL 6 DAY) OR 
  (t.startdate <= ? + INTERVAL 6 DAY AND t.enddate >= ?)) AND 
  (
   (ep.empl_id = tr.empl_id AND 
    ep.proj_id = tr.proj_id AND 
    t.id = tr.task_id AND tr.day <= ? + INTERVAL 7 DAY AND 
    tr.day >= ? + INTERVAL -14 DAY
   ) OR 
   (
    (SELECT count(*) 
     FROM TIMERECORD tr2 
     WHERE 
     tr2.empl_id=ep.empl_id AND 
     tr2.proj_id=p.id AND tr2.day <= ? + INTERVAL 7 DAY AND 
     tr2.day >= ? + INTERVAL -14 DAY) <= 0
    )
  ) 

我正在使用mysql服务器5.1.40。

编辑(2): 随着评论和回答,我来到这个查询,在一秒钟内执行(差不多一分钟就来了!)

SELECT DISTINCT t.id as taskId, t.name as taskName, 
  t.startdate as taskStartDate, t.enddate as taskEndDate, 
  t.proj_id as taskProjectId 
FROM (PROJECT p INNER JOIN EMPL_PROJ ep ON  ep.proj_id = p.id)  
  INNER JOIN TASK t ON p.id=t.proj_id 
  INNER JOIN TIMERECORD tr ON tr.empl_id=ep.empl_id AND tr.proj_id=ep.proj_id 
    AND tr.task_id=t.id
WHERE 
  ep.empl_id = ? AND 
  ((p.startdate IS NULL AND p.enddate IS NULL) OR 
   (p.startdate IS NULL AND p.enddate >= ?) OR 
   (p.enddate IS NULL AND p.startdate <= ? + INTERVAL 6 DAY) OR 
   (p.startdate <= ? + INTERVAL 6 DAY AND p.enddate >= ?) ) AND 
  ((t.startdate IS NULL AND t.enddate IS NULL) OR 
   (t.startdate IS NULL AND t.enddate >= ?) OR 
   (t.enddate IS NULL AND t.startdate <= ? + INTERVAL 6 DAY) OR 
   (t.startdate <= ? + INTERVAL 6 DAY AND t.enddate >= ?)) AND 
  (
   (
    tr.day <= ? + INTERVAL 7 DAY AND 
    tr.day >= ? + INTERVAL -14 DAY
   ) OR 
   (
    NOT EXISTS(SELECT *
      FROM TIMERECORD tr2 INNER JOIN EMPL_PROJ ON tr2.empl_id=EMPL_PROJ.empl_id 
        INNER JOIN PROJECT ON PROJECT.id=tr2.proj_id
      WHERE 
       tr2.day BETWEEN ? + INTERVAL -14 DAY AND ? + INTERVAL 7 DAY)
   )
  ) 
  ORDER BY p.id, t.id

最大的贡献是建议NOT EXISTS方法(我标记为正确)的答案以及不混合explicitimplicit JOIN的评论。

谢谢大家!

2 个答案:

答案 0 :(得分:2)

当你似乎只需要一个NOT EXISTS ......

时,你正在使用COUNT(*)
(
(SELECT count(*) 
 FROM TIMERECORD tr2 
 WHERE 
 tr2.empl_id=ep.empl_id AND 
 tr2.proj_id=p.id AND tr2.day <= ? + INTERVAL 7 DAY AND 
 tr2.day >= ? + INTERVAL -14 DAY) <= 0
)

替换为

(
NOT EXISTS(SELECT * 
 FROM TIMERECORD tr2 
 WHERE 
 tr2.empl_id=ep.empl_id AND 
 tr2.proj_id=p.id AND tr2.day <= ? + INTERVAL 7 DAY AND 
 tr2.day >= ? + INTERVAL -14 DAY)
)

现在,如果TIMERECORD确实存在,那么where子句的一部分将短路为FALSE(NOT TRUE),而不必计算每个TIMERECORD。

答案 1 :(得分:0)

摆脱子查询。

(1)。使用所有empl_ids&amp;的empl_id,proj_id,count(*)列分别计算子查询。 proj_ids,其中一天落在所需范围内。这是一个简单的查询组。

select empl_id,proj_id,count(*) as ct from TIMERECORD 
where day between (? + INTERVAL -14 DAY) and (? + INTERVAL 7 DAY)
group by empl_id,proj_id;

调用此结果集B

(2)。正如您现在所做的那样计算剩余的查询。调用此结果集A

(3)。使用在A&amp; A中常见的列empl_id,proj_id进行左外连接B.乙

然后在where子句中可以检查B.ct列中的值,对于给定时间范围内未在TIMERECORD表中找到任何条目的所有empl_id,proj_id组合,它将为null。

实际上你甚至不需要数(*),因为你不会对实际数量感到困扰。但是,让我不要过于复杂化它。