如何提高查询NOT IN的性能

时间:2015-05-28 15:45:44

标签: sql sql-server performance tsql notin

我有以下SQL查询。

SELECT em.employeeid, tsi.timestamp
FROM timesheet_temp_import tsi
JOIN employee emp ON emp.employeeid = tsi.credentialnumber
WHERE
tsi.masterentity = 'MASTER' AND
tsi.timestamp NOT IN
(
    SELECT ea.timestamp 
    FROM employee_attendance ea 
    WHERE 
    ea.employeeid = em.employeeid
    AND ea.timestamp =  tsi.timestamp
    AND ea.ismanual = 0
)
GROUP BY em.employeeid, tsi.timestamp

此查询会比较导入表(包括员工时间和出勤时间戳)。

有时timesheet_temp_import行的行数超过95,000,而且我的查询必须显示 为员工 new 的时间戳。如果员工已经存在时间戳,那么我必须将其排除。

查询正在运行,但是花了4分多钟,所以我想知道我是否可以改进NOT IN语句,以帮助我减少这段时间。

5 个答案:

答案 0 :(得分:6)

使用NOT EXISTS可能对您有帮助。

SELECT 
    em.employeeid,
    tsi.timestamp
    FROM timesheet_temp_import tsi
    join employee emp ON emp.employeeid = tsi.credentialnumber
    WHERE
    tsi.masterentity = 'MASTER' AND

    NOT EXISTS 
    (
        SELECT NULL  
        FROM employee_attendance ea 
        WHERE 
        ea.employeeid = em.employeeid
        AND ea.timestamp =  tsi.timestamp
        AND ea.ismanual = 0
    )
    GROUP BY 
    em.employeeid,
    tsi.timestamp

答案 1 :(得分:3)

您有此查询:

SELECT em.employeeid, tsi.timestamp
FROM timesheet_temp_import tsi JOIN
     employee emp
     ON emp.employeeid = tsi.credentialnumber
WHERE tsi.masterentity = 'MASTER' AND
      tsi.timestamp NOT IN (SELECT ea.timestamp 
                            FROM employee_attendance ea 
                            WHERE ea.employeeid = em.employeeid AND
                                  ea.timestamp =  tsi.timestamp AND
                                  ea.ismanual = 0
                           )
GROUP BY em.employeeid, tsi.timestamp;

在重写查询之前(而不是重新格式化它);我会检查索引和逻辑。 GROUP BY是否必要?也就是说,外部查询是否存在重复?我猜不是,但我不知道你的数据。

其次,你想要索引。我认为以下索引:timesheet_temp_import(masterentity, credentialnumber, timestamp)employee(employeeid)employee_attendance(employeeid, timestamp, ismanual)

第三,我会问你是否有非员工的时间表。我想你可以摆脱外在的join。所以,这可能是您想要的查询:

SELECT tsi.credentialnumber as employeeid, tsi.timestamp
FROM timesheet_temp_import tsi
WHERE tsi.masterentity = 'MASTER' AND
      tsi.timestamp NOT IN (SELECT ea.timestamp 
                            FROM employee_attendance ea 
                            WHERE ea.employeeid = tsi.credentialnumber AND
                                  ea.timestamp =  tsi.timestamp AND
                                  ea.ismanual = 0
                           );

NOT IN替换为NOT EXISTS,您可能也会获得微不足道的改善。

答案 2 :(得分:2)

另一种方法是使用except

select whatever
from wherever
where somefield in 
(select all potential values of that field
except
select the values you want to exlude)

这在逻辑上等同于not in,但更快。

答案 3 :(得分:2)

尝试这个,我瘦你的意思是emp

SELECT distinct tsi.credentialnumber, tsi.timestamp
  FROM timesheet_temp_import tsi
  JOIN employee emp 
    ON emp.employeeid = tsi.credentialnumber
   and tsi.masterentity = 'MASTER' 
  left join employee_attendance ea 
    on ea.employeeid = emp.employeeid
   AND ea.timestamp = tsi.timestamp
   AND ea.ismanual = 0
 where ea.employeeid is null

取决于索引,这可能会更快

SELECT distinct tsi.credentialnumber, tsi.timestamp
  FROM timesheet_temp_import tsi
  JOIN employee emp 
    ON emp.employeeid = tsi.credentialnumber
   and tsi.masterentity = 'MASTER' 
  left join employee_attendance ea 
    on ea.employeeid = tsi.credentialnumber
   AND ea.timestamp = tsi.timestamp
   AND ea.ismanual = 0
 where ea.employeeid is null

答案 4 :(得分:1)

使用LEFT JOINWHERE子句代替NOT IN进行过滤:

SELECT 
    em.employeeid,
    tsi.timestamp
    FROM timesheet_temp_import tsi
    join employee emp ON emp.employeeid = tsi.credentialnumber
    left join 
    (
        SELECT ea.timestamp 
        FROM employee_attendance ea 
        WHERE 
        ea.employeeid = em.employeeid
        AND ea.timestamp =  tsi.timestamp
        AND ea.ismanual = 0
    ) t on t.timestamp = tsi.timestamp
    WHERE
    tsi.masterentity = 'MASTER' AND
    t.timestamp is null
    GROUP BY 
    em.employeeid,
    tsi.timestamp