SQL / Teradata:如何返回特定值及其前面的行?

时间:2017-09-21 19:09:28

标签: sql teradata

我有一个我无法解决的SQL(Teradata)问题。我知道答案可能比我看起来简单得多。

我有一组这样的代码:

ID      timestamp         location      event_type
1111    20160601-0112     Detroit       Event A
1111    20160602-0954     Brooklyn      Event B
1111    20160602-1123     Brooklyn      Event A
1112    20160912-1420     Minneapolis   Event B
1113    20161123-1742     New Orleans   Event A
1113    20161124-1841     New Orleans   Event A
1113    20161124-2100     New Orleans   Event B
1114    20170201-0959     Detroit       Event A
1114    20170201-2350     Detroit       Event A

以下是我需要返回的条件:

我想返回每个ID的FIRST事件B,以及在事件B之前发生的最近的事件A(基于时间戳)。因此,对于上面的数据集,我会得到:

ID      timestamp         location      event_type
1111    20160601-0112     Detroit       Event A
1111    20160602-0954     Brooklyn      Event B
1113    20161124-1841     New Orleans   Event A
1113    20161124-2100     New Orleans   Event B

1111的第三个记录没有被返回,因为它发生在事件B之后.ID 1112没有被返回,因为它之前没有事件A. 1113的第一个记录不会被返回,因为在它之后有更接近的事件A(到事件B)。由于没有事件B,因此无法返回1114。

我一直在努力工作几个小时,以至于我不再清楚地接近它......任何帮助都将不胜感激!

2 个答案:

答案 0 :(得分:2)

鉴于您的样本数据,我认为以下应该可以解决问题。

SELECT *
FROM testtable
QUALIFY

    (
        event_type = 'Event A'
        AND
        min(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
    ) OR
    (
        event_type = 'Event B'
        AND
        max(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
    )

这里我们使用Window函数来测试结果集中记录之前和之后的记录。我们在QUALIFY子句中执行此操作,该子句的作用类似于WHERE子句,但对于Window Functions。

打破此限定声明:

        event_type = 'Event A'
        AND
        min(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'

说“如果此当前记录是”事件A“并且按时间戳为此ID排序的下一条记录是”事件B“,则允许记录”。

        event_type = 'Event B'
        AND
        max(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'

说“如果此当前记录是”事件B“并且按时间戳排序此ID的前一条记录是”事件A“,则允许记录。

可能有必要在QUALIFY子句中进行更多创建以捕获边缘情况,但是一旦你了解它的工作方式,你就可以在那里获得相当的创意。

示例:

CREATE MULTISET VOLATILE TABLE testtable
(
 id int,
 ts varchar(20),
 location varchar(20),
 event_type varchar(20)

) PRIMARY INDEX (id) ON COMMIT PRESERVE ROWS;

INSERT INTO testtable VALUES (1111,'20160601-0112','Detroit','Event A');
INSERT INTO testtable VALUES (1111,'20160602-0954','Brooklyn','Event B');
INSERT INTO testtable VALUES (1111,'20160602-1123','Brooklyn','Event A');
INSERT INTO testtable VALUES (1112,'20160912-1420','Minneapolis','Event B');
INSERT INTO testtable VALUES (1113,'20161123-1742','New Orleans','Event A');
INSERT INTO testtable VALUES (1113,'20161124-1841','New Orleans','Event A');
INSERT INTO testtable VALUES (1113,'20161124-2100','New Orleans','Event B');
INSERT INTO testtable VALUES (1114,'20170201-0959','Detroit','Event A');
INSERT INTO testtable VALUES (1114,'20170201-2350','Detroit','Event A');


SELECT *
FROM testtable
QUALIFY

    (
        event_type = 'Event A'
        AND
        min(event_type) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
    ) OR
    (
        event_type = 'Event B'
        AND
        max(event_type) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
    );

+------+---------------+-------------+------------+
|  id  |      ts       |  location   | event_type |
+------+---------------+-------------+------------+
| 1111 | 20160601-0112 | Detroit     | Event A    |
| 1111 | 20160602-0954 | Brooklyn    | Event B    |
| 1113 | 20161124-1841 | New Orleans | Event A    |
| 1113 | 20161124-2100 | New Orleans | Event B    |
+------+---------------+-------------+------------+

答案 1 :(得分:0)

这适用于MS SQL 2012 - 不确定语法是否相同?

;WITH myData AS
    (
        SELECT
            ID,
            min(timestamp) as timestamp,
            location,
            event_type
        FROM
            tableName
        GROUP BY
            ID,
            location,
            event_type
    )
        SELECT
            *
        FROM
            myData
        WHERE
            (
                myData.timestamp < (SELECT top 1 m2.timestamp FROM myData m2 WHERE m2.ID = myData.ID AND m2.event_type = 'Event B' ORDER BY m2.timestamp ASC)
                OR
                myData.event_type = 'Event B'
            )
            AND (SELECT Count(*) FROM myData m2 WHERE m2.ID = myData.ID AND m2.event_type = 'Event A') > 0
        ORDER BY
            myData.timestamp