我们有一个表格可以保存有关员工间隔的信息。我们称之为INTERVAL_TABLE。
我们在用户开始间隔和完成时保存。用户可以根据需要多次开始间隔,也可以按照自己的意愿完成多次。
这是INTERVAL_TABLE
:
INTERVAL_ID | USER_ID | INTERVAL_TYPE_ID | INTERVAL_TIMESTAMP | ENTRY_TYPE
用户可以在表格中输入以下内容:
现在,我们必须创建一个报告,该报告结合了该表的不同条目,这些条目引用相同的用户和间隔类型。我们应该能够识别具有开始和结束的间隔,并将这两个分组在一行中。假设上图中的数据,报告的输出应如下:
输出应按日期排序,如上图所示。
我不知道如何创建查询来执行此操作。
谢谢!
修改 - 额外信息:
要查找任何INIT间隔的END间隔,我们应该根据该间隔的时间戳找到最接近的END间隔。这就是我们如何知道我们应该将ID 1与ID 2匹配而不是ID 3。
重要的是要注意,如果INIT间隔后跟另一个INIT间隔(基于时间戳),我们不应该继续为该INIT找到END。那是因为这是一个没有END的INIT。
答案 0 :(得分:4)
这可能不是最有效的方法(我想象一个递归查询),但我发现这些子查询更容易维护:
WITH ordered_table AS (
SELECT row_number() OVER(ORDER BY USER_ID,INTERVAL_TYPE_ID,INTERVAL_TIMESTAMP ASC) row_num, *
FROM INTERVAL_TABLE
ORDER BY row_num
),
_inits AS (
SELECT
t1.USER_ID,
t1.INTERVAL_TYPE_ID AS INTERVAL_TYPE,
t1.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_INIT,
CASE
WHEN t1.ENTRY_TYPE = 'INIT_INTERVAL'
AND t2.ENTRY_TYPE = 'END_INTERVAL'
AND t1.USER_ID = t2.USER_ID
AND t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
THEN t2.INTERVAL_TIMESTAMP
END AS INTERVAL_TIMESTAMP_END,
t1.INTERVAL_ID AS INTERVAL_ID_INIT,
CASE
WHEN t1.ENTRY_TYPE = 'INIT_INTERVAL'
AND t2.ENTRY_TYPE = 'END_INTERVAL'
AND t1.USER_ID = t2.USER_ID
AND t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
THEN t2.INTERVAL_ID
END AS INTERVAL_ID_END
FROM ordered_table AS t1
LEFT JOIN ordered_table AS t2 ON (
t1.row_num = t2.row_num - 1 AND
t1.USER_ID = t2.USER_ID AND
t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
)
WHERE t1.ENTRY_TYPE = 'INIT_INTERVAL'
),
_ends AS (
SELECT
t2.USER_ID,
t2.INTERVAL_TYPE_ID AS INTERVAL_TYPE,
NULL::timestamp AS INTERVAL_TIMESTAMP_INIT,
CASE
WHEN (
t1.ENTRY_TYPE = 'END_INTERVAL' AND
t2.ENTRY_TYPE = 'END_INTERVAL'
)
OR (t1.ENTRY_TYPE IS NULL) -- case when first record for USER_ID and INTERVAL_TYPE_ID is an END
THEN t2.INTERVAL_TIMESTAMP
END AS INTERVAL_TIMESTAMP_END,
NULL::int AS INTERVAL_ID_INIT,
t2.INTERVAL_ID AS INTERVAL_ID_END
FROM ordered_table AS t1
RIGHT JOIN ordered_table AS t2 ON (
t1.row_num = t2.row_num - 1 AND
t1.USER_ID = t2.USER_ID AND
t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
)
WHERE t2.ENTRY_TYPE = 'END_INTERVAL'
)
SELECT * FROM (
SELECT * FROM _inits
UNION ALL
SELECT * FROM _ends
) qry
WHERE
COALESCE(interval_timestamp_init, interval_timestamp_end) IS NOT NULL
ORDER BY
USER_ID,
INTERVAL_TYPE,
COALESCE(interval_timestamp_init, interval_timestamp_end)
基本上,INIT将始终列出。它们将具有关联的END或null。所以_inits
的所有内容几乎都会在那里。
因为IN已经捕获了END,我们只需要捕获那些没有INIT的(它们之前是END)。
因为它们是外连接,所以你只需删除INIT和END都为NULL的情况并应用正确的顺序。
答案 1 :(得分:3)
使用LEAD
和LAG
函数可以轻松高效地完成。至少它比表格的自联接效率更高:O(n)
vs O(n*n)
。
首先使用LEAD
和LAG
添加适当PARTITION BY
的下一行和上一行的列。
然后构建两组对 - 第一组以INIT_INTERVAL
开头,第二组以END_INTERVAL
结尾。如果有一对同时包含Init和End - 它将被包含两次,之后在UNION
中消除。
示例数据(除了屏幕截图之外,您应该在问题中包含此内容)
CREATE TABLE INTERVAL_TABLE (
INTERVAL_ID int,
USER_ID int,
INTERVAL_TYPE_ID int,
INTERVAL_TIMESTAMP timestamp,
ENTRY_TYPE varchar(255));
INSERT INTO INTERVAL_TABLE (INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, INTERVAL_TIMESTAMP, ENTRY_TYPE) VALUES
(1, 1, 1, '2018-03-08 14:00:00', 'INIT_INTERVAL'),
(2, 1, 1, '2018-03-08 15:00:00', 'END_INTERVAL' ),
(3, 1, 1, '2018-03-08 15:30:00', 'END_INTERVAL' ),
(4, 1, 1, '2018-03-08 15:45:00', 'INIT_INTERVAL'),
(5, 1, 1, '2018-03-08 15:50:00', 'INIT_INTERVAL');
<强>查询强>
WITH
CTE
AS
(
SELECT
USER_ID
,INTERVAL_TYPE_ID
,ENTRY_TYPE AS Curr_Entry_Type
,INTERVAL_TIMESTAMP AS Curr_Interval_Timestamp
,INTERVAL_ID AS Curr_Interval_ID
,LAG(ENTRY_TYPE) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Prev_Entry_Type
,LAG(INTERVAL_TIMESTAMP) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Prev_Interval_Timestamp
,LAG(INTERVAL_ID) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Prev_Interval_ID
,LEAD(ENTRY_TYPE) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Next_Entry_Type
,LEAD(INTERVAL_TIMESTAMP) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Next_Interval_Timestamp
,LEAD(INTERVAL_ID) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Next_Interval_ID
FROM
INTERVAL_TABLE
)
,CTE_Result
AS
(
SELECT
USER_ID
,INTERVAL_TYPE_ID
,Curr_Entry_Type AS Entry_Type_Init
,Curr_Interval_Timestamp AS Interval_Timestamp_Init
,Curr_Interval_ID AS Interval_ID_Init
,Next_Entry_Type AS Entry_Type_End
,CASE WHEN Next_Entry_Type = 'END_INTERVAL' THEN Next_Interval_Timestamp END AS Interval_Timestamp_End
,CASE WHEN Next_Entry_Type = 'END_INTERVAL' THEN Next_Interval_ID END AS Interval_ID_End
FROM CTE
WHERE Curr_Entry_Type = 'INIT_INTERVAL'
UNION -- sic! not UNION ALL
SELECT
USER_ID
,INTERVAL_TYPE_ID
,Prev_Entry_Type AS Entry_Type_Init
,CASE WHEN Prev_Entry_Type = 'INIT_INTERVAL' THEN Prev_Interval_Timestamp END AS Interval_Timestamp_Init
,CASE WHEN Prev_Entry_Type = 'INIT_INTERVAL' THEN Prev_Interval_ID END AS Interval_ID_Init
,Curr_Entry_Type AS Entry_Type_End
,Curr_Interval_Timestamp AS Interval_Timestamp_End
,Curr_Interval_ID AS Interval_ID_End
FROM CTE
WHERE Curr_Entry_Type = 'END_INTERVAL'
)
SELECT
USER_ID
,INTERVAL_TYPE_ID
,Interval_Timestamp_Init
,Interval_Timestamp_End
,Interval_ID_Init
,Interval_ID_End
FROM CTE_Result
ORDER BY
USER_ID
,INTERVAL_TYPE_ID
,COALESCE(Interval_Timestamp_Init, Interval_Timestamp_End)
<强> Results 强>
| user_id | interval_type_id | interval_timestamp_init | interval_timestamp_end | interval_id_init | interval_id_end |
|---------|------------------|-------------------------|------------------------|------------------|-----------------|
| 1 | 1 | 2018-03-08T14:00:00Z | 2018-03-08T15:00:00Z | 1 | 2 |
| 1 | 1 | (null) | 2018-03-08T15:30:00Z | (null) | 3 |
| 1 | 1 | 2018-03-08T15:45:00Z | (null) | 4 | (null) |
| 1 | 1 | 2018-03-08T15:50:00Z | (null) | 5 | (null) |
答案 2 :(得分:2)
此查询提供您需要的输出:
WITH Intervals AS
(
WITH Events AS
(
WITH OrderedEvents AS
(
SELECT INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, INTERVAL_TIMESTAMP, ENTRY_TYPE, row_number() over (partition by USER_ID, INTERVAL_TYPE_ID order by INTERVAL_TIMESTAMP ASC) AS EVENT_ORDER FROM INTERVAL_TABLE
UNION ALL
SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'INIT_INTERVAL' AS ENTRY_TYPE, 0 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
UNION ALL
SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'END_INTERVAL' AS ENTRY_TYPE, COUNT(*) + 1 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
)
SELECT Events1.USER_ID, Events1.INTERVAL_TYPE_ID, Events1.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_INIT, Events2.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_END, Events1.INTERVAL_ID AS INTERVAL_ID_INIT, Events2.INTERVAL_ID AS INTERVAL_ID_END, Events1.ENTRY_TYPE AS ENTRY_TYPE1, Events2.ENTRY_TYPE AS ENTRY_TYPE2
FROM OrderedEvents Events1 INNER JOIN
OrderedEvents Events2
ON Events1.USER_ID = Events2.USER_ID AND Events1.INTERVAL_TYPE_ID = Events2.INTERVAL_TYPE_ID AND Events1.EVENT_ORDER + 1 = Events2.EVENT_ORDER
)
SELECT USER_ID, INTERVAL_TYPE_ID,
CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_TIMESTAMP_INIT
WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN INTERVAL_TIMESTAMP_INIT
WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN NULL
END AS INTERVAL_TIMESTAMP_INIT,
CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_TIMESTAMP_END
WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN NULL
WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_TIMESTAMP_END
END AS INTERVAL_TIMESTAMP_END,
CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_ID_INIT
WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN INTERVAL_ID_INIT
WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN NULL
END AS INTERVAL_ID_INIT,
CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_ID_END
WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN NULL
WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_ID_END
END AS INTERVAL_ID_END
FROM Events
)
SELECT * FROM Intervals WHERE INTERVAL_ID_INIT IS NOT NULL OR INTERVAL_ID_END IS NOT NULL;
首先,我们构建OrderedEvents
CTE,按USER_ID
和INTERVAL_TYPE_ID
对条目进行分组,在每个组中按INTERVAL_TIMESTAMP
对其进行排序,并为每个事件分配数字顺序。
另外,对于每个小组,我们将INIT_INTERVAL
添加为第一个事件,将END_INTERVAL
添加为最后一个事件,以涵盖小组以END_INTERVAL
开头或以INIT_INTERVAL
结束时的情况:
WITH OrderedEvents AS
(
SELECT INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, INTERVAL_TIMESTAMP, ENTRY_TYPE, row_number() over (partition by USER_ID, INTERVAL_TYPE_ID order by INTERVAL_TIMESTAMP ASC) AS EVENT_ORDER FROM INTERVAL_TABLE
UNION ALL
SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'INIT_INTERVAL' AS ENTRY_TYPE, 0 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
UNION ALL
SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'END_INTERVAL' AS ENTRY_TYPE, COUNT(*) + 1 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
)
SELECT * FROM OrderedEvents ORDER BY user_id, interval_type_id, event_order;
此查询为提供的数据提供以下结果:
然后我们在OrderedEvents
和USER_ID
上将INTERVAL_TYPE_ID
与自己相交,并选择一对邻居事件(Events1.EVENT_ORDER + 1 = Events2.EVENT_ORDER
):
WITH OrderedEvents AS
(
...
)
SELECT Events1.USER_ID, Events1.INTERVAL_TYPE_ID, Events1.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_INIT, Events2.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_END, Events1.INTERVAL_ID AS INTERVAL_ID_INIT, Events2.INTERVAL_ID AS INTERVAL_ID_END, Events1.ENTRY_TYPE AS ENTRY_TYPE1, Events2.ENTRY_TYPE AS ENTRY_TYPE2
FROM OrderedEvents Events1 INNER JOIN
OrderedEvents Events2
ON Events1.USER_ID = Events2.USER_ID AND Events1.INTERVAL_TYPE_ID = Events2.INTERVAL_TYPE_ID AND Events1.EVENT_ORDER + 1 = Events2.EVENT_ORDER
此查询提供以下结果:
现在我们应该根据你描述的逻辑将这些邻居事件对转换为间隔。上一个输出包含entry_type1
和entry_type2
列,其值可以为INIT_INTERVAL
或END_INTERVAL
。
可能的组合是:
<INIT_INTERVAL, END_INTERVAL>
- 当INIT_INTERVAL
后跟END_INTERVAL
时,这是最自然的情况。我们按原样获取事件值。<INIT_INTERVAL(1), INIT_INTERVAL(2)>
- 两个连续INIT_INTERVAL
的情况。我们通过<INIT_INTERVAL(1), NULL>
强制结束间隔。如果它位于第一个条目中,INIT_INTERVAL(2)
将与下一对一起使用。<END_INTERVAL(1), END_INTERVAL(2)>
- 两个连续END_INTERVAL
的情况。我们通过<NULL, END_INTERVAL(2)>
强制开始间隔。 END_INTERVAL(1)
由案例#1处理,或者当它是该对中的第二个条目时由当前案例处理。<END_INTERVAL, INIT_INTERVAL>
- 刚跳过这样的对。 END_INTERVAL
由案例#1或案例#3采取。 INIT_INTERVAL
由案例#1或案例#2承担。所有这些逻辑都被放入CASE
个表达式中。有4个这样的表达式具有重复的条件,因为我们有条件地选择了4个不同的列(INTERVAL_TIMESTAMP_INIT
,INTERVAL_TIMESTAMP_END
,INTERVAL_ID_INIT
和INTERVAL_ID_END
),这些列不能用一个{CASE
1}}表达。
最终输出与您描述的相同:
答案 3 :(得分:1)
您可以使用INTERVAL_ID(或带有生成的row_number的新列)连接同一个表的两个实例,使用如下谓词:
on a.INTERVAL_ID=b.INTERVAL_ID + 1
通过这种方式,您可以将每条记录与下一条记录进行比较并获得1行。