基于条件的分组数据

时间:2013-10-09 01:24:27

标签: sql-server-2008 tsql group-by

我将根据某些条件获取一些我需要过滤掉的数据。样本数据:

Cust_ID Date    Result
1   2013-08-15  On hold
2   2013-08-16  NULL
3   2013-08-18  WIP
1   2013-08-20  Completed
3   2013-08-25  NULL
4   2013-08-28  NULL
4   2013-08-29  NULL

条件:

  1. 根据最新日期(即Max(Date))
  2. 获取不同的Cust_ID
  3. 如果最新日期的结果为空,则使用除NULL之外的任何其他结果获取最新记录。
  4. 如果具有相同Cust_ID的所有记录的结果为NULL,请根据日期
  5. 选择最新的记录

    所需的输出应为:

    Cust_ID Date    Result
    1   2013-08-20  Completed
    2   2013-08-16  NULL    
    3   2013-08-18  WIP
    4   2013-08-29  NULL
    

    请告知。

2 个答案:

答案 0 :(得分:0)

您可以使用CTE轻松完成,请注意CTE不是“需要”(您可以使用子查询),但我认为它清楚地说明了您在做什么。

WITH NonNull AS
(
   SELECT CustID, MAX(Date) as Date
   FROM tablename
   GROUP BY CustID
   WHERE Result is not null
), Others AS
(
   SELECT CustID, MAX(Date) as Date
   FROM tablename
   GROUP BY CustID
   WHERE CustID NOT IN (SELECT CustID FROM NonNull)
), AlltogetherNow -- not really needed but clearer
(
   SELECT CustID, Date
   FROM NonNull
   UNION ALL
   SELECT CustID, Date
   FROM Others
)
SELECT A.CustID, A.Date, J.Results
FROM AlltogetherNow A
JOIN tablename J ON A.CustID = J.CustID AND A.Date = J.Date

答案 1 :(得分:0)

首先,每行需要一个IS NULL指示符:

SQL Fiddle

MS SQL Server 2008架构设置

CREATE TABLE dbo.Results
    ([CustID] int, [Date] datetime, [Result] varchar(9))
GO

INSERT INTO dbo.Results
    ([CustID], [Date], [Result])
VALUES
    (1, '2013-08-15 00:00:00', 'On Hold'),
    (2, '2013-08-16 00:00:00', NULL),
    (3, '2013-08-18 00:00:00', 'WIP'),
    (1, '2013-08-20 00:00:00', 'Completed'),
    (3, '2013-08-25 00:00:00', NULL),
    (4, '2013-08-28 00:00:00', NULL),
    (4, '2013-08-29 00:00:00', NULL)
GO

查询1

SELECT *,CASE WHEN Result IS NULL THEN 0 ELSE 1 END IsNotNull
  FROM dbo.Results

<强> Results

| CUSTID |                          DATE |    RESULT | ISNOTNULL |
|--------|-------------------------------|-----------|-----------|
|      1 | August, 15 2013 00:00:00+0000 |   On Hold |         1 |
|      2 | August, 16 2013 00:00:00+0000 |    (null) |         0 |
|      3 | August, 18 2013 00:00:00+0000 |       WIP |         1 |
|      1 | August, 20 2013 00:00:00+0000 | Completed |         1 |
|      3 | August, 25 2013 00:00:00+0000 |    (null) |         0 |
|      4 | August, 28 2013 00:00:00+0000 |    (null) |         0 |
|      4 | August, 29 2013 00:00:00+0000 |    (null) |         0 |

然后,您需要确定每个客户的第一行NULL行和第一行NOT NULL行。您可以使用ROW_NUMBER()功能。您还需要知道每个客户是否有NOT NULL行:

查询2

SELECT *,
       ROW_NUMBER()OVER(PARTITION BY CustID,IsNotNull ORDER BY [Date] DESC) _rn,
       COUNT(Result)OVER(PARTITION BY CustID) NotNullCount
  FROM(
    SELECT *,CASE WHEN Result IS NULL THEN 0 ELSE 1 END IsNotNull
    FROM dbo.Results
  )X1

<强> Results

| CUSTID |                          DATE |    RESULT | ISNOTNULL | _RN | NOTNULLCOUNT |
|--------|-------------------------------|-----------|-----------|-----|--------------|
|      1 | August, 20 2013 00:00:00+0000 | Completed |         1 |   1 |            2 |
|      1 | August, 15 2013 00:00:00+0000 |   On Hold |         1 |   2 |            2 |
|      2 | August, 16 2013 00:00:00+0000 |    (null) |         0 |   1 |            0 |
|      3 | August, 25 2013 00:00:00+0000 |    (null) |         0 |   1 |            1 |
|      3 | August, 18 2013 00:00:00+0000 |       WIP |         1 |   1 |            1 |
|      4 | August, 29 2013 00:00:00+0000 |    (null) |         0 |   1 |            0 |
|      4 | August, 28 2013 00:00:00+0000 |    (null) |         0 |   2 |            0 |

最后,如果使用计算出的行号,如果没有NOT NULL行,则可以过滤掉第一行NULL行:

查询3

NOT NULL

<强> Results

SELECT CustID,[Date],Result
FROM(
  SELECT *,
         ROW_NUMBER()OVER(PARTITION BY CustID,IsNotNull ORDER BY [Date] DESC) _rn,
         COUNT(Result)OVER(PARTITION BY CustID) NotNullCount
    FROM(
      SELECT *,CASE WHEN Result IS NULL THEN 0 ELSE 1 END IsNotNull
      FROM dbo.Results
    )X1
  )X2
 WHERE _rn = 1 AND SIGN(NotNullCount) = IsNotNull