结合同一桌子的不同行 - Postgres

时间:2018-03-09 13:26:13

标签: sql postgresql

我们有一个表格可以保存有关员工间隔的信息。我们称之为INTERVAL_TABLE。

我们在用户开始间隔和完成时保存。用户可以根据需要多次开始间隔,也可以按照自己的意愿完成多次。

这是INTERVAL_TABLE

的简化结构
   INTERVAL_ID | USER_ID | INTERVAL_TYPE_ID | INTERVAL_TIMESTAMP | ENTRY_TYPE

用户可以在表格中输入以下内容:

table possible entries

现在,我们必须创建一个报告,该报告结合了该表的不同条目,这些条目引用相同的用户和间隔类型。我们应该能够识别具有开始和结束的间隔,并将这两个分组在一行中。假设上图中的数据,报告的输出应如下:

report expected output

输出应按日期排序,如上图所示。

我不知道如何创建查询来执行此操作。

谢谢!

修改 - 额外信息:

要查找任何INIT间隔的END间隔,我们应该根据该间隔的时间戳找到最接近的END间隔。这就是我们如何知道我们应该将ID 1与ID 2匹配而不是ID 3。

重要的是要注意,如果INIT间隔后跟另一个INIT间隔(基于时间戳),我们不应该继续为该INIT找到END。那是因为这是一个没有END的INIT。

4 个答案:

答案 0 :(得分:4)

DBFiddle

这可能不是最有效的方法(我想象一个递归查询),但我发现这些子查询更容易维护:

WITH ordered_table AS (
  SELECT row_number() OVER(ORDER BY USER_ID,INTERVAL_TYPE_ID,INTERVAL_TIMESTAMP ASC) row_num, *
  FROM INTERVAL_TABLE
  ORDER BY row_num
),

_inits AS (
  SELECT
    t1.USER_ID,
    t1.INTERVAL_TYPE_ID      AS INTERVAL_TYPE,
    t1.INTERVAL_TIMESTAMP    AS INTERVAL_TIMESTAMP_INIT,
    CASE
      WHEN t1.ENTRY_TYPE = 'INIT_INTERVAL' 
       AND t2.ENTRY_TYPE = 'END_INTERVAL' 
       AND t1.USER_ID          = t2.USER_ID
       AND t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID      
      THEN t2.INTERVAL_TIMESTAMP 
    END                      AS INTERVAL_TIMESTAMP_END,
    t1.INTERVAL_ID           AS INTERVAL_ID_INIT,
    CASE
      WHEN t1.ENTRY_TYPE = 'INIT_INTERVAL' 
       AND t2.ENTRY_TYPE = 'END_INTERVAL' 
       AND t1.USER_ID          = t2.USER_ID
       AND t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
      THEN t2.INTERVAL_ID 
    END                      AS INTERVAL_ID_END
  FROM      ordered_table AS t1
  LEFT JOIN ordered_table AS t2 ON (
    t1.row_num = t2.row_num - 1 AND
    t1.USER_ID = t2.USER_ID AND
    t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
  )
  WHERE t1.ENTRY_TYPE = 'INIT_INTERVAL'
),

_ends AS (
  SELECT
    t2.USER_ID,
    t2.INTERVAL_TYPE_ID      AS INTERVAL_TYPE,
    NULL::timestamp          AS INTERVAL_TIMESTAMP_INIT,
    CASE
      WHEN (
          t1.ENTRY_TYPE = 'END_INTERVAL' AND
          t2.ENTRY_TYPE = 'END_INTERVAL'
        ) 
        OR (t1.ENTRY_TYPE IS NULL) -- case when first record for USER_ID and INTERVAL_TYPE_ID is an END
      THEN t2.INTERVAL_TIMESTAMP 
    END                      AS INTERVAL_TIMESTAMP_END,
    NULL::int                AS INTERVAL_ID_INIT,
    t2.INTERVAL_ID           AS INTERVAL_ID_END
  FROM       ordered_table AS t1
  RIGHT JOIN ordered_table AS t2 ON (
    t1.row_num = t2.row_num - 1 AND
    t1.USER_ID = t2.USER_ID AND
    t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
  )
  WHERE t2.ENTRY_TYPE = 'END_INTERVAL'
)

SELECT * FROM (
  SELECT * FROM _inits
  UNION ALL
  SELECT * FROM _ends
) qry
WHERE 
  COALESCE(interval_timestamp_init, interval_timestamp_end) IS NOT NULL
ORDER BY 
  USER_ID, 
  INTERVAL_TYPE, 
  COALESCE(interval_timestamp_init, interval_timestamp_end)

基本上,INIT将始终列出。它们将具有关联的END或null。所以_inits的所有内容几乎都会在那里。

因为IN已经捕获了END,我们只需要捕获那些没有INIT的(它们之前是END)。

因为它们是外连接,所以你只需删除INIT和END都为NULL的情况并应用正确的顺序。

答案 1 :(得分:3)

使用LEADLAG函数可以轻松高效地完成。至少它比表格的自联接效率更高:O(n) vs O(n*n)

首先使用LEADLAG添加适当PARTITION BY的下一行和上一行的列。

然后构建两组对 - 第一组以INIT_INTERVAL开头,第二组以END_INTERVAL结尾。如果有一对同时包含Init和End - 它将被包含两次,之后在UNION中消除。

SQL Fiddle

示例数据(除了屏幕截图之外,您应该在问题中包含此内容)

CREATE TABLE INTERVAL_TABLE (
  INTERVAL_ID int,
  USER_ID int,
  INTERVAL_TYPE_ID int,
  INTERVAL_TIMESTAMP timestamp,
  ENTRY_TYPE varchar(255));

INSERT INTO INTERVAL_TABLE (INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, INTERVAL_TIMESTAMP, ENTRY_TYPE) VALUES
(1, 1, 1, '2018-03-08 14:00:00', 'INIT_INTERVAL'),
(2, 1, 1, '2018-03-08 15:00:00', 'END_INTERVAL' ),
(3, 1, 1, '2018-03-08 15:30:00', 'END_INTERVAL' ),
(4, 1, 1, '2018-03-08 15:45:00', 'INIT_INTERVAL'),
(5, 1, 1, '2018-03-08 15:50:00', 'INIT_INTERVAL');

<强>查询

WITH
CTE
AS
(
  SELECT
    USER_ID
    ,INTERVAL_TYPE_ID
    ,ENTRY_TYPE AS Curr_Entry_Type
    ,INTERVAL_TIMESTAMP AS Curr_Interval_Timestamp
    ,INTERVAL_ID AS Curr_Interval_ID

    ,LAG(ENTRY_TYPE) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Prev_Entry_Type
    ,LAG(INTERVAL_TIMESTAMP) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Prev_Interval_Timestamp
    ,LAG(INTERVAL_ID) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Prev_Interval_ID

    ,LEAD(ENTRY_TYPE) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Next_Entry_Type
    ,LEAD(INTERVAL_TIMESTAMP) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Next_Interval_Timestamp
    ,LEAD(INTERVAL_ID) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Next_Interval_ID
  FROM
    INTERVAL_TABLE
)
,CTE_Result
AS
(
  SELECT
    USER_ID
    ,INTERVAL_TYPE_ID
    ,Curr_Entry_Type AS Entry_Type_Init
    ,Curr_Interval_Timestamp AS Interval_Timestamp_Init
    ,Curr_Interval_ID AS Interval_ID_Init
    ,Next_Entry_Type AS Entry_Type_End
    ,CASE WHEN Next_Entry_Type = 'END_INTERVAL' THEN Next_Interval_Timestamp END AS Interval_Timestamp_End
    ,CASE WHEN Next_Entry_Type = 'END_INTERVAL' THEN Next_Interval_ID END AS Interval_ID_End
  FROM CTE
  WHERE Curr_Entry_Type = 'INIT_INTERVAL'

  UNION -- sic! not UNION ALL

  SELECT
    USER_ID
    ,INTERVAL_TYPE_ID
    ,Prev_Entry_Type AS Entry_Type_Init
    ,CASE WHEN Prev_Entry_Type = 'INIT_INTERVAL' THEN Prev_Interval_Timestamp END AS Interval_Timestamp_Init
    ,CASE WHEN Prev_Entry_Type = 'INIT_INTERVAL' THEN Prev_Interval_ID END AS Interval_ID_Init
    ,Curr_Entry_Type AS Entry_Type_End
    ,Curr_Interval_Timestamp AS Interval_Timestamp_End
    ,Curr_Interval_ID AS Interval_ID_End
  FROM CTE
  WHERE Curr_Entry_Type = 'END_INTERVAL'
)
SELECT
    USER_ID
    ,INTERVAL_TYPE_ID
    ,Interval_Timestamp_Init
    ,Interval_Timestamp_End
    ,Interval_ID_Init
    ,Interval_ID_End
FROM CTE_Result
ORDER BY
  USER_ID
  ,INTERVAL_TYPE_ID
  ,COALESCE(Interval_Timestamp_Init, Interval_Timestamp_End)

<强> Results

| user_id | interval_type_id | interval_timestamp_init | interval_timestamp_end | interval_id_init | interval_id_end |
|---------|------------------|-------------------------|------------------------|------------------|-----------------|
|       1 |                1 |    2018-03-08T14:00:00Z |   2018-03-08T15:00:00Z |                1 |               2 |
|       1 |                1 |                  (null) |   2018-03-08T15:30:00Z |           (null) |               3 |
|       1 |                1 |    2018-03-08T15:45:00Z |                 (null) |                4 |          (null) |
|       1 |                1 |    2018-03-08T15:50:00Z |                 (null) |                5 |          (null) |

答案 2 :(得分:2)

此查询提供您需要的输出:

WITH Intervals AS
(
    WITH Events AS
    (
        WITH OrderedEvents AS
        (
            SELECT INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, INTERVAL_TIMESTAMP, ENTRY_TYPE, row_number() over (partition by USER_ID, INTERVAL_TYPE_ID order by INTERVAL_TIMESTAMP ASC) AS EVENT_ORDER FROM INTERVAL_TABLE
            UNION ALL
            SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'INIT_INTERVAL' AS ENTRY_TYPE, 0 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
            UNION ALL
            SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'END_INTERVAL' AS ENTRY_TYPE, COUNT(*) + 1 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
        )
        SELECT Events1.USER_ID, Events1.INTERVAL_TYPE_ID, Events1.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_INIT, Events2.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_END, Events1.INTERVAL_ID AS INTERVAL_ID_INIT, Events2.INTERVAL_ID  AS INTERVAL_ID_END, Events1.ENTRY_TYPE AS ENTRY_TYPE1, Events2.ENTRY_TYPE AS ENTRY_TYPE2
        FROM OrderedEvents Events1 INNER JOIN
        OrderedEvents Events2
        ON Events1.USER_ID = Events2.USER_ID AND Events1.INTERVAL_TYPE_ID = Events2.INTERVAL_TYPE_ID AND Events1.EVENT_ORDER + 1 = Events2.EVENT_ORDER
    )
    SELECT USER_ID, INTERVAL_TYPE_ID,

      CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_TIMESTAMP_INIT
           WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN INTERVAL_TIMESTAMP_INIT
           WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN NULL
      END AS INTERVAL_TIMESTAMP_INIT,

      CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_TIMESTAMP_END
           WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN NULL
           WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_TIMESTAMP_END
      END AS INTERVAL_TIMESTAMP_END,

      CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_ID_INIT
           WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN INTERVAL_ID_INIT
           WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN NULL
      END AS INTERVAL_ID_INIT,

      CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_ID_END
           WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN NULL
           WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_ID_END
      END AS INTERVAL_ID_END

    FROM Events
)
SELECT * FROM Intervals WHERE INTERVAL_ID_INIT IS NOT NULL OR INTERVAL_ID_END IS NOT NULL;

首先,我们构建OrderedEvents CTE,按USER_IDINTERVAL_TYPE_ID对条目进行分组,在每个组中按INTERVAL_TIMESTAMP对其进行排序,并为每个事件分配数字顺序。 另外,对于每个小组,我们将INIT_INTERVAL添加为第一个事件,将END_INTERVAL添加为最后一个事件,以涵盖小组以END_INTERVAL开头或以INIT_INTERVAL结束时的情况:

WITH OrderedEvents AS
(
    SELECT INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, INTERVAL_TIMESTAMP, ENTRY_TYPE, row_number() over (partition by USER_ID, INTERVAL_TYPE_ID order by INTERVAL_TIMESTAMP ASC) AS EVENT_ORDER FROM INTERVAL_TABLE
    UNION ALL
    SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'INIT_INTERVAL' AS ENTRY_TYPE, 0 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
    UNION ALL
    SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'END_INTERVAL' AS ENTRY_TYPE, COUNT(*) + 1 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
)
SELECT * FROM OrderedEvents ORDER BY user_id, interval_type_id, event_order;

此查询为提供的数据提供以下结果:

enter image description here

然后我们在OrderedEventsUSER_ID上将INTERVAL_TYPE_ID与自己相交,并选择一对邻居事件(Events1.EVENT_ORDER + 1 = Events2.EVENT_ORDER):

WITH OrderedEvents AS
(
    ...
)
SELECT Events1.USER_ID, Events1.INTERVAL_TYPE_ID, Events1.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_INIT, Events2.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_END, Events1.INTERVAL_ID AS INTERVAL_ID_INIT, Events2.INTERVAL_ID  AS INTERVAL_ID_END, Events1.ENTRY_TYPE AS ENTRY_TYPE1, Events2.ENTRY_TYPE AS ENTRY_TYPE2
FROM OrderedEvents Events1 INNER JOIN
OrderedEvents Events2
ON Events1.USER_ID = Events2.USER_ID AND Events1.INTERVAL_TYPE_ID = Events2.INTERVAL_TYPE_ID AND Events1.EVENT_ORDER + 1 = Events2.EVENT_ORDER

此查询提供以下结果:

enter image description here

现在我们应该根据你描述的逻辑将这些邻居事件对转换为间隔。上一个输出包含entry_type1entry_type2列,其值可以为INIT_INTERVALEND_INTERVAL。 可能的组合是:

  • <INIT_INTERVAL, END_INTERVAL> - 当INIT_INTERVAL后跟END_INTERVAL时,这是最自然的情况。我们按原样获取事件值。
  • <INIT_INTERVAL(1), INIT_INTERVAL(2)> - 两个连续INIT_INTERVAL的情况。我们通过<INIT_INTERVAL(1), NULL>强制结束间隔。如果它位于第一个条目中,INIT_INTERVAL(2)将与下一对一起使用。
  • <END_INTERVAL(1), END_INTERVAL(2)> - 两个连续END_INTERVAL的情况。我们通过<NULL, END_INTERVAL(2)>强制开始间隔。 END_INTERVAL(1)由案例#1处理,或者当它是该对中的第二个条目时由当前案例处理。
  • <END_INTERVAL, INIT_INTERVAL> - 刚跳过这样的对。 END_INTERVAL由案例#1或案例#3采取。 INIT_INTERVAL由案例#1或案例#2承担。

所有这些逻辑都被放入CASE个表达式中。有4个这样的表达式具有重复的条件,因为我们有条件地选择了4个不同的列(INTERVAL_TIMESTAMP_INITINTERVAL_TIMESTAMP_ENDINTERVAL_ID_INITINTERVAL_ID_END),这些列不能用一个{CASE 1}}表达。

最终输出与您描述的相同:

enter image description here

答案 3 :(得分:1)

您可以使用INTERVAL_ID(或带有生成的row_number的新列)连接同一个表的两个实例,使用如下谓词:

on a.INTERVAL_ID=b.INTERVAL_ID + 1

通过这种方式,您可以将每条记录与下一条记录进行比较并获得1行。