具有多个ROW_NUMBER或RANK的SQL

时间:2013-03-12 01:22:36

标签: sql sql-server-2008

我需要在(例如)Person和PersonEvents之间对同一个表进行多个连接。每个人有多个事件(0或更多)。我需要创建一个VIEW,从最近的事件中选择具有特定列的每个人,以及来自最近一次事件的列。

人员数据:

Id    Name
1     Iain
2     Fred
3     Mary
4     Foo
5     Bar

PersonEvents数据:

PersonId    DateStarted                ReasonForLeaving
1           2011-03-12 00:00:00.000    sick
1           2013-02-12 00:00:00.000    NULL
1           2012-04-12 00:00:00.000    holiday
2           2011-05-12 00:00:00.000    new baby
2           2013-06-12 00:00:00.000    NULL
2           2012-07-12 00:00:00.000    had enough
3           2011-08-12 00:00:00.000    pregnant
3           2013-09-12 00:00:00.000    NULL
4           2012-10-12 00:00:00.000    NULL

输出样本将是:

Id   Name    MemberSince                ReasonForChange
1    Iain    2011-03-12 00:00:00.000    holiday
4    Foo     2012-10-12 00:00:00.000    NULL
...

“旧方式”使用前1个连接或子选择语句:

SELECT p.*,
    (
        SELECT TOP 1 DateStarted
        FROM PersonEvents e
        WHERE e.PersonId = p.Id
        ORDER BY DateFoo DESC
    ) As MemberSince
FROM Person p
....

但是,如果您需要此Join中的多个列(例如Date,Comment和其他ID),那么您需要执行多个子选择语句,这很昂贵。

所以问题是:如何使用最新和之前事件的行号从连接中获取多个列?

2 个答案:

答案 0 :(得分:4)

我提出的最直接(即可读的SQL)答案使用WITH和ROW_NUMBER。

首先,创建一个ROW_NUMBER查询,对事件进行排序,并为该PersonId唯一的每个事件提供一个数字:

SELECT *,
    ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY DateStarted DESC) AS EventOrder
FROM PersonEvents

结果:

PersonId    DateStarted              ReasonForLeaving    EventOrder
1           2013-02-12 00:00:00.000  NULL                1
1           2012-04-12 00:00:00.000  holiday             2
1           2011-03-12 00:00:00.000  sick                3
2           2013-06-12 00:00:00.000  NULL                1
2           2012-07-12 00:00:00.000  had enough          2
2           2011-05-12 00:00:00.000  new baby            3
3           2013-09-12 00:00:00.000  NULL                1
3           2011-08-12 00:00:00.000  pregnant            2
4           2012-10-12 00:00:00.000  NULL                1

现在,每个人的“第一个”事件(在我的情况下是最新的)包含更改的日期(现实生活中的示例:这是多个学生的注册历史数据学校,包含学校ID和许多其他guff)。每个人的“第二”事件包含先前的事件和离开的原因。要将它们添加到一起:

WITH SortedEvents AS (
     SELECT *,
         ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY ReasonForLeaving DESC) AS EventOrder
     FROM PersonEvents
)
SELECT p.*, MostRecent.DateStarted AS MemberSince, NextRecent.ReasonForLeaving AS ReasonForChange
FROM Person p
     LEFT OUTER JOIN SortedEvents AS MostRecent ON p.Id = MostRecent.PersonId AND MostRecent.EventOrder = 1
     LEFT OUTER JOIN SortedEvents AS NextRecent ON p.Id = NextRecent.PersonId AND NextRecent.EventOrder = 2

提供格式良好的输出:

Id          Name   MemberSince              ReasonForChange
1           Iain   2013-02-12 00:00:00.000  holiday
2           Fred   2013-06-12 00:00:00.000  had enough
3           Mary   2013-09-12 00:00:00.000  pregnant
4           Foo    2012-10-12 00:00:00.000  NULL
5           Bar    NULL                     NULL

实际上你可以从任何行号中选择多个列。现实生活中的例子(再次,学生注册历史)选择:

  1. 从硕士生表中:
    • 学生ID
    • 名称
    • DOB等
  2. 从“注册历史记录”表中选择“当前注册”
    • 学校ID
    • 各种注册状态信息
    • 日期开始
  3. 从“注册历史记录”表中选择“之前的注册”
    • 离开的原因
  4. 这种方法非常有效,约有150,000名学生及其各自的历史。

    为我的测试完成SQL:

    CREATE TABLE Person
    (
         Id INT NOT NULL,
         Name VARCHAR(50)
    )
    GO
    CREATE TABLE PersonEvents
    (
         PersonId INT NOT NULL,
         DateStarted DATETIME NOT NULL,
         ReasonForLeaving VARCHAR(50)
    )
    GO
    INSERT INTO Person
         SELECT 1, 'Iain' UNION ALL
         SELECT 2, 'Fred' UNION ALL
         SELECT 3, 'Mary' UNION ALL
         SELECT 4, 'Foo'  UNION ALL
         SELECT 5, 'Bar'
    GO
    INSERT INTO PersonEvents
         SELECT 1, '20110312', 'sick'       UNION ALL
         SELECT 1, '20130212', NULL         UNION ALL
         SELECT 1, '20120412', 'holiday'    UNION ALL
         SELECT 2, '20110512', 'new baby'   UNION ALL
         SELECT 2, '20130612', NULL         UNION ALL
         SELECT 2, '20120712', 'had enough' UNION ALL
         SELECT 3, '20110812', 'pregnant'   UNION ALL
         SELECT 3, '20130912', NULL         UNION ALL
         SELECT 4, '20121012', NULL
    GO
    
    --SELECT *
    --FROM Person
    --SELECT *
    --FROM PersonEvents
    --GO
    WITH SortedEvents AS (
        SELECT *,
            ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY DateStarted DESC) AS EventOrder
        FROM PersonEvents
    )
    SELECT p.*, MostRecent.DateStarted AS MemberSince, NextRecent.ReasonForLeaving AS ReasonForChange
    FROM Person p
        LEFT OUTER JOIN SortedEvents AS MostRecent ON p.Id = MostRecent.PersonId AND MostRecent.EventOrder = 1
        LEFT OUTER JOIN SortedEvents AS NextRecent ON p.Id = NextRecent.PersonId AND NextRecent.EventOrder = 2
    GO
    
    SELECT p.*,
        (
            SELECT TOP 1 DateStarted
            FROM PersonEvents pe
            WHERE pe.PersonId = p.Id
            ORDER BY DateStarted DESC
        ) AS MemberSince,
        'unknown' AS ReasonForChange
    FROM Person p
    GO
    
    DROP TABLE Person
    DROP TABLE PersonEvents
    GO
    

答案 1 :(得分:0)

对于上一个事件和上一个事件日期:

SELECT ID,NAME,NextToMostEventDate,ReasonForLeaving
FROM PersonEvents pe
INNER JOIN(
    SELECT pe1.PersonId,TheMostEventDate,NextToMostEventDate=MAX(pe1.DateStarted)
    FROM PersonEvents pe1
    INNER JOIN(
        SELECT PersonId,TheMostEventDate=MAX(DateStarted)
        FROM PersonEvents
        GROUP BY PersonId 
    ) pe2 
    ON pe2.PersonId=pe1.PersonId
    WHERE DateStarted<TheMostEventDate
    GROUP BY pe1.PersonId,TheMostEventDate
) pe12 ON pe12.PersonId=pe.PersonId
INNER JOIN Person ON Id=pe.PersonId
WHERE pe.DateStarted=TheMostEventDate

对于上一个活动日期和上一个活动:

SELECT ID,NAME,TheMostEventDate,ReasonForLeaving
FROM PersonEvents pe
INNER JOIN(
    SELECT pe1.PersonId,TheMostEventDate,NextToMostEventDate=MAX(pe1.DateStarted)
    FROM PersonEvents pe1
    INNER JOIN(
        SELECT PersonId,TheMostEventDate=MAX(DateStarted)
        FROM PersonEvents
        GROUP BY PersonId 
    ) pe2 
    ON pe2.PersonId=pe1.PersonId
    WHERE DateStarted<TheMostEventDate
    GROUP BY pe1.PersonId,TheMostEventDate
) pe12 ON pe12.PersonId=pe.PersonId
INNER JOIN Person ON Id=pe.PersonId
WHERE pe.DateStarted=NextToMostEventDate