我有一个我无法解决的SQL(Teradata)问题。我知道答案可能比我看起来简单得多。
我有一组这样的代码:
ID timestamp location event_type
1111 20160601-0112 Detroit Event A
1111 20160602-0954 Brooklyn Event B
1111 20160602-1123 Brooklyn Event A
1112 20160912-1420 Minneapolis Event B
1113 20161123-1742 New Orleans Event A
1113 20161124-1841 New Orleans Event A
1113 20161124-2100 New Orleans Event B
1114 20170201-0959 Detroit Event A
1114 20170201-2350 Detroit Event A
以下是我需要返回的条件:
我想返回每个ID的FIRST事件B,以及在事件B之前发生的最近的事件A(基于时间戳)。因此,对于上面的数据集,我会得到:
ID timestamp location event_type
1111 20160601-0112 Detroit Event A
1111 20160602-0954 Brooklyn Event B
1113 20161124-1841 New Orleans Event A
1113 20161124-2100 New Orleans Event B
1111的第三个记录没有被返回,因为它发生在事件B之后.ID 1112没有被返回,因为它之前没有事件A. 1113的第一个记录不会被返回,因为在它之后有更接近的事件A(到事件B)。由于没有事件B,因此无法返回1114。
我一直在努力工作几个小时,以至于我不再清楚地接近它......任何帮助都将不胜感激!
答案 0 :(得分:2)
鉴于您的样本数据,我认为以下应该可以解决问题。
SELECT *
FROM testtable
QUALIFY
(
event_type = 'Event A'
AND
min(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
) OR
(
event_type = 'Event B'
AND
max(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
)
这里我们使用Window函数来测试结果集中记录之前和之后的记录。我们在QUALIFY子句中执行此操作,该子句的作用类似于WHERE子句,但对于Window Functions。
打破此限定声明:
event_type = 'Event A'
AND
min(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
说“如果此当前记录是”事件A“并且按时间戳为此ID排序的下一条记录是”事件B“,则允许记录”。
event_type = 'Event B'
AND
max(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
说“如果此当前记录是”事件B“并且按时间戳排序此ID的前一条记录是”事件A“,则允许记录。
可能有必要在QUALIFY子句中进行更多创建以捕获边缘情况,但是一旦你了解它的工作方式,你就可以在那里获得相当的创意。
示例:
CREATE MULTISET VOLATILE TABLE testtable
(
id int,
ts varchar(20),
location varchar(20),
event_type varchar(20)
) PRIMARY INDEX (id) ON COMMIT PRESERVE ROWS;
INSERT INTO testtable VALUES (1111,'20160601-0112','Detroit','Event A');
INSERT INTO testtable VALUES (1111,'20160602-0954','Brooklyn','Event B');
INSERT INTO testtable VALUES (1111,'20160602-1123','Brooklyn','Event A');
INSERT INTO testtable VALUES (1112,'20160912-1420','Minneapolis','Event B');
INSERT INTO testtable VALUES (1113,'20161123-1742','New Orleans','Event A');
INSERT INTO testtable VALUES (1113,'20161124-1841','New Orleans','Event A');
INSERT INTO testtable VALUES (1113,'20161124-2100','New Orleans','Event B');
INSERT INTO testtable VALUES (1114,'20170201-0959','Detroit','Event A');
INSERT INTO testtable VALUES (1114,'20170201-2350','Detroit','Event A');
SELECT *
FROM testtable
QUALIFY
(
event_type = 'Event A'
AND
min(event_type) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
) OR
(
event_type = 'Event B'
AND
max(event_type) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
);
+------+---------------+-------------+------------+
| id | ts | location | event_type |
+------+---------------+-------------+------------+
| 1111 | 20160601-0112 | Detroit | Event A |
| 1111 | 20160602-0954 | Brooklyn | Event B |
| 1113 | 20161124-1841 | New Orleans | Event A |
| 1113 | 20161124-2100 | New Orleans | Event B |
+------+---------------+-------------+------------+
答案 1 :(得分:0)
这适用于MS SQL 2012 - 不确定语法是否相同?
;WITH myData AS
(
SELECT
ID,
min(timestamp) as timestamp,
location,
event_type
FROM
tableName
GROUP BY
ID,
location,
event_type
)
SELECT
*
FROM
myData
WHERE
(
myData.timestamp < (SELECT top 1 m2.timestamp FROM myData m2 WHERE m2.ID = myData.ID AND m2.event_type = 'Event B' ORDER BY m2.timestamp ASC)
OR
myData.event_type = 'Event B'
)
AND (SELECT Count(*) FROM myData m2 WHERE m2.ID = myData.ID AND m2.event_type = 'Event A') > 0
ORDER BY
myData.timestamp