仅获取按列分组的最新行

时间:2010-05-18 16:08:07

标签: sql

我有大量的电子邮件数据集和状态代码。

ID Recipient           Date       Status
 1 someone@example.com 01/01/2010      1
 2 someone@example.com 02/01/2010      1
 3 them@example.com    01/01/2010      1
 4 them@example.com    02/01/2010      2
 5 them@example.com    03/01/2010      1
 6 others@example.com  01/01/2010      1
 7 others@example.com  02/01/2010      2

在这个例子中:

  • 发送给某人的所有电子邮件的状态均为 1
  • 发送至他们的中间电子邮件(按日期)状态为 2 ,但最新为 1
  • 发送至其他人的最后一封电子邮件的状态为 2

我需要检索的是发送给每个人的所有电子邮件的数量,以及最新状态代码的内容。

第一部分相当简单:

SELECT Recipient, Count(*) EmailCount
FROM Messages
GROUP BY Recipient
ORDER BY Recipient

这给了我:

Recipient           EmailCount
someone@example.com 2
them@example.com    3
others@example.com  2

我如何获得最新的状态代码?

最终结果应为:

Recipient           EmailCount LastStatus
someone@example.com          2          1
them@example.com             3          1
others@example.com           2          2

感谢。

(服务器是Microsoft SQL Server 2008,查询是通过.Net中的OleDbConnection运行的)

5 个答案:

答案 0 :(得分:4)

这是“每组最大数量”查询的示例。我认为将它分成两个子查询然后加入结果是最容易理解的。

第一个子查询就是你已经拥有的。

第二个子查询使用窗口函数ROW_NUMBER为每个收件人的电子邮件编号,从最开始的1开始,然后是2,3,等...

然后将第一个查询的结果与第二个查询的结果连接起来,第二个查询的行号为1,即最新的。这样做可以保证在有联系的情况下,每个收件人只能获得一行。

以下是查询:

SELECT T1.Recipient, T1.EmailCount, T2.Status FROM
(
    SELECT Recipient, COUNT(*) AS EmailCount
    FROM Messages
    GROUP BY Recipient
) T1
JOIN
(
    SELECT
        Recipient,
        Status,
        ROW_NUMBER() OVER (PARTITION BY Recipient ORDER BY Date Desc) AS rn
    FROM Messages
) T2
ON T1.Recipient = T2.Recipient AND T2.rn = 1

这给出了以下结果:

Recipient            EmailCount  Status  
others@example.com   2           2       
someone@example.com  2           1       
them@example.com     3           1       

答案 1 :(得分:2)

它不是很漂亮,但我可能只是使用了几个子选择:

SELECT Recipient,
    COUNT(*) EmailCount,
    (SELECT Status
     FROM Messages M2
     WHERE Recipient = M.Recipient
         AND Date = (SELECT MAX(Date)
                     FROM Messages
                     WHERE Recipient = M2.Recipient))
FROM Messages M
GROUP BY Recipient
ORDER BY Recipient

答案 2 :(得分:2)

SELECT
    M.Recipient,
    C.EmailCount,
    M.Status
FROM
    (
    SELECT Recipient, Count(*) EmailCount
    FROM Messages
    GROUP BY Recipient
    ) C
    JOIN
    (
    SELECT Recipient, MAX(Date) AS LastDate
    FROM Messages
    GROUP BY Recipient
    ) MD ON C.Recipient = MD.Recipient
    JOIN
    Messages M ON MD.Recipient = M.Recipient AND MD.LastDate = M.Date
ORDER BY
    Recipient

我发现聚合大多比分级函数更好地扩展

答案 3 :(得分:1)

您不能轻易地执行此操作是单个查询,因为count(*)是组函数,而最新状态来自特定行。以下是获取每个用户的最新状态的查询:

SELECT M.Recipient, M.Status FROM Messages M
WHERE M.Date = (SELECT MAX(SUB.Date) FROM MESSAGES SUB
    WHERE SUB.Recipient = M.Recipient)

答案 4 :(得分:0)

您可以使用排名功能。像(未经测试)的东西:

WITH MyResults AS
(
   SELECT Recipient, Status, ROW_NUMBER() OVER( Recipient ORDER BY (  [date] DESC ) ) AS   [row_number]
   FROM Messages
)
SELECT MyResults.Recipient, MyCounts.EmailCount, MyResults.Status
FROM (
    SELECT Recipient, Count(*) EmailCount
    FROM Messages
    GROUP BY Recipient
) MyCounts
INNER JOIN MyResults
ON MyCounts.Recipient = MyResults.Recipient
WHERE MyResults.[row_number] = 1