快速SQL查询,用于连接集

时间:2017-01-25 23:39:49

标签: sql sql-server tsql

假设有以下表格

表A

WorkId       DateA
-----        -------
1           01/01/2017

表B

WorkId       DateB        Flag    User
-----        -------      ----    -----
1           01/12/2016     N       u1
1           03/12/2016     N       u2
1           01/01/2017     Y       u2
1           02/01/2017     Y       u3
1           02/01/2017     Y       u3
1           05/01/2017     N       u4 
1           05/01/2017     N       u5 
1           06/01/2017     N       u5
1           10/01/2017     Y       u5 
1           12/01/2017     Y       u6
1           12/01/2017     N       u7

表A中的每个记录都应该基于TableA.WorkId = TableB.WorkId和TableA.DateA = TableB.DateB(表B中的此连接始终为Flag = y)连接到表B中的记录。基于此连接,我应该得到WorkId / TableA.DateA和TableB.User(下面结果中的user1)。例如,表A中的上述记录由表B的第三行连接。

然后我需要从表B获得第一条记录,其标志为N,并且在DateA之后具有最小日期。在这个例子中是表A中的第六条记录。然后我需要将此用户(user2)和日期(DateB)添加到结果中:

结果

WorkId    DateA        DateB         User1     User2
-----     -------      ------        -----     -----
1         01/01/2017   05/01/2017    u2         u4

我使用了以下查询

WITH c AS (
SELECT a.WorkId, a.DateA, b.User AS User1
FROM TableA a
INNER JOIN TableB b
ON a.WorkId = b.WorkId AND a.DateA = b.DateB
),

c1 AS (
SELECT c.*, b.DateB, b.User AS User2
, ROW_NUMBER() OVER (PARTITION BY b.WorkId, c.DateA ORDER BY b.DateB) AS rn
FROM c
LEFT OUTER JOIN TableB b
ON c.WorkId = b.WorkId AND b.Flag = 'N' AND b.DateB > c.DateA
)

SELECT *
FROM c1
WHERE rn = 1

我在每个表上都有两个索引WorkId + Data和Data。

问题是查询速度很慢,而且当表非常大时,查询速度会变得非常慢。你知道更快的代码吗?感谢。

1 个答案:

答案 0 :(得分:1)

以下是制定查询的一种方法:

select a.*, b2.date as date2, b.user as user1, bnext.user as user2
from tableA a join
     tableB b 
     on a.workid = b.workid and a.date = b.date outer apply
     (select top 1 b2.*
      from tableB b2
      where b2.workId = a.workid and b2.date > a.date and b2.flag = 'N'
      order by b2.date desc
     ) bnext;

对于join,您需要tableB(workId, date)上的索引 - 键可以按任意顺序排列。对于子查询,您需要tableB(workId, date, flag, user)上的索引。这一个查询实际上就是你所需要的。

嗯。还有另一种方法可能更快:

select workid, date1, date as date2, user1, user as user2
from (select ab.*, min(date) over (partition by workid, grp) as date1,
             max(user1) over (partition by workid, grp) as user1,
             row_number() over (partition by workid, grp, flag) as seqnum
      from (select b.*,
                   sum(case when a.workid is not null then 1 else 0 end) over (partition by b.workid order by b.date) as grp,
                   max(case when a.workid is not null then user end) as user1
            from tableB b left join
                 tableA a
                 on a.workid = b.workid and a.date = b.date
           ) ab
     ) ab
where seqnum = 1 and flag = 'N';

这要复杂得多,它依赖于A中的行在B上的匹配中不相互重叠。这个想法是它在B中找到匹配,然后它使用窗口函数来查找标志为N的第一行