如何将列添加到查询中而无需将其放在分组依据中?

时间:2018-08-10 21:10:58

标签: sql postgresql greatest-n-per-group

我有一张表来保存过程。

每个过程都是由项目组成的,每个项目都有其在过程中收集的值。

进程由客户端执行。

以下是带有伪数据的示例数据库方案:http://sqlfiddle.com/#!15/36af4

我需要从这些表中提取一些信息:

  • 商品ID
  • 物品寿命
  • 每项每生命的总过程
  • 每件商品每生最旧的流程(时间戳)
  • 每项每生命(时间戳)的最新过程
  • 每生每件物品的最大物品价值
  • 每生命中每件物品的最低物品价值

我还需要查找每个项目的最新和最旧流程的流程ID和客户ID:

  • 每项生命中每个项目中最早的流程的流程ID
  • 每项生命中每个项目的最新流程的流程ID
  • 每项每生命中最旧进程的客户ID
  • 每件商品每生最新过程的客户ID

请注意,对于特定寿命中的特定项目,最早的过程的进程ID与该寿命中该项目的最小值的过程ID不匹配。

我需要每一项生活中的所有这些信息。一个项目可以在不同的客户端中具有一个进程,因此我无法按客户端分组,因为这可能导致该项目的重复。流程也是如此,因为一个项目可以存在于不同的流程中,我也无法按流程分组。

这是我可以为自己做的最远的事情:

 SELECT
  PV.ID_ITEM                                                       AS ID_ITEM,
  PV.ITEM_LIFE                                                     AS LIFE,
  COUNT(PV.ID_ITEM)                                                AS TOTAL_ITEM_PROCESS,
  MIN(P.DATE_TIME)                                                AS OLDEST_PROCESS,
  MAX(P.DATE_TIME)                                                AS NEWEST_PROCESS,
  MAX(GREATEST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3)) AS MAX_ITEM_VALUE,
  MIN(LEAST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3))    AS MIN_ITEM_VALUE
FROM PROCESS P
  JOIN PROCESS_VALUES PV ON P._ID = PV.ID_PROCESS
GROUP BY PV.ID_ITEM, PV.ITEM_LIFE;

但是我不知道如何将最旧和最新进程的客户端和进程ID添加到此查询中,而不必在group by子句中添加这些相同的列。例如,如果我在组中添加客户端ID,则如果在不同客户端中有针对某些项目的流程,则某些项目将重复。

我们不能使用MAX或MIN来获取进程ID,因为返回的ID与收集的时间戳不匹配。较高的ID并不总是具有最新的时间戳。

对于提琴中提供的数据,应为输出:

+---------+-----------+-----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+
| ITEM ID | ITEM LIFE | TOTAL PROCESSES PER ITEM PER LIFE | OLDEST PROCESS PER ITEM PER LIFE | NEWEST PROCESS PER ITEM PER LIFE | MAX ITEM VALUE PER ITEM PER LIFE | MIN ITEM VALUE PER ITEM PER LIFE | PROCESS ID OF THE OLDEST PROCESS | PROCESS ID OF THE NEWEST PROCESS | CLIENT ID OF THE OLDEST PROCESS | CLIENT ID OF THE NEWEST PROCESS |
+---------+-----------+-----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+
|     230 |         1 |                                 1 |  '2018-01-01 10:00:00'           |  '2018-01-02 10:00:00'           |                              6.5 |                              1.5 |                                1 |                                2 |                             100 |                             100 |
|     230 |         2 |                                 1 |  '2018-01-01 10:00:00'           |  '2018-01-02 10:00:00'           |                              5.5 |                              2.5 |                                1 |                                2 |                             100 |                             100 |
|     231 |         1 |                                 1 |  '2018-01-01 10:00:00'           |  '2018-01-02 10:00:00'           |                              7.5 |                              1.5 |                                1 |                                2 |                             100 |                             100 |
|     231 |         2 |                                 1 |  '2018-01-01 10:00:00'           |  '2018-01-02 10:00:00'           |                             10.8 |                              4.5 |                                1 |                                2 |                             100 |                             100 |
|     232 |         1 |                                 1 |  '2018-01-01 10:00:00'           |  '2018-01-02 10:00:00'           |                              5.6 |                              0.5 |                                1 |                                2 |                             100 |                             100 |
|     232 |         2 |                                 1 |  '2018-01-01 10:00:00'           |  '2018-01-02 10:00:00'           |                              2.5 |                             25.5 |                                1 |                                2 |                             100 |                             100 |
|     530 |         1 |                                 2 |  '2018-01-05 13:00:00'           |  '2018-01-06 13:00:00'           |                             11.5 |                              1.5 |                                4 |                                3 |                             400 |                             300 |
|     531 |         1 |                                 2 |  '2018-01-05 13:00:00'           |  '2018-01-06 13:00:00'           |                              9.5 |                              1.5 |                                4 |                                3 |                             400 |                             300 |
|     532 |         1 |                                 2 |  '2018-01-05 13:00:00'           |  '2018-01-06 13:00:00'           |                             13.5 |                              4.5 |                                4 |                                3 |                             400 |                             300 |
+---------+-----------+-----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+

如何将更多列添加到查询中而无需将它们放在分组依据中?

我们使用PostgreSQL。

3 个答案:

答案 0 :(得分:5)

这似乎是一个常见的问题ObservableCollection<Attribut>(),但略有扭曲。

在这里,我使用top-n-per-group方法。另一种方法是使用横向连接。哪种方法更快取决于数据分布。

您可以在dba.se Retrieving n rows per group上阅读非常详细的答案。 (该问题是针对SQL Server的,但是Postgres具有所有相同的功能,因此所有答案都适用于Postgres。)

SQL Fiddle

我调整了样本数据,并使所有ROW_NUMBER不同,以便能够在结果中看到它们。

ID_CLIENT

查询

我使用CTE编写了查询以使其可读。如果您愿意,可以将它们全部内联到一个庞大的查询中。您应该尝试测试运行速度更快的东西。

CREATE TABLE PROCESS ( _ID BIGINT, DATE_TIME TIMESTAMP WITH TIME ZONE, ID_CLIENT BIGINT ); CREATE TABLE PROCESS_VALUES ( ID_PROCESS BIGINT, ID_ITEM BIGINT, ITEM_LIFE INTEGER, ITEM_VALUE_1 REAL, ITEM_VALUE_2 REAL, ITEM_VALUE_3 REAL ); INSERT INTO PROCESS VALUES(1, '2018-01-01 10:00:00', 100); INSERT INTO PROCESS VALUES(2, '2018-01-02 10:00:00', 200); INSERT INTO PROCESS VALUES(3, '2018-01-06 13:00:00', 300); INSERT INTO PROCESS VALUES(4, '2018-01-05 13:00:00', 400); INSERT INTO PROCESS_VALUES VALUES(1, 230, 1, 5.5, 6.5, 1.5); INSERT INTO PROCESS_VALUES VALUES(1, 231, 1, 1.5, 7.5, 3.5); INSERT INTO PROCESS_VALUES VALUES(1, 232, 1, 5.6, 3.5, 0.5); INSERT INTO PROCESS_VALUES VALUES(2, 230, 2, 5.5, 2.5, 4.5); INSERT INTO PROCESS_VALUES VALUES(2, 231, 2, 10.8, 6.5, 4.5); INSERT INTO PROCESS_VALUES VALUES(2, 232, 2, 25.5, 6.5, 2.5); INSERT INTO PROCESS_VALUES VALUES(3, 530, 1, 1.5, 6.5, 8.5); INSERT INTO PROCESS_VALUES VALUES(3, 531, 1, 3.5, 6.5, 1.5); INSERT INTO PROCESS_VALUES VALUES(3, 532, 1, 6.5, 7.0, 4.5); INSERT INTO PROCESS_VALUES VALUES(4, 530, 1, 1.5, 11.5, 4.5); INSERT INTO PROCESS_VALUES VALUES(4, 531, 1, 9.5, 8.5, 1.5); INSERT INTO PROCESS_VALUES VALUES(4, 532, 1, 5.5, 13.5, 4.5); 是原始的两个表,并与两组行号连接在一起,一组用于最旧的进程,一组用于最新的(这就是CTE_RN的作用);每个项目,每个生命(这就是ORDER BY DATE_TIME ASC/DESC的目的。)

PARTITION BY仅保留最旧的进程。

CTE_OLDEST仅保留最新的进程。

CTE_NEWEST是您对计算所有汇总的问题的查询。

最终CTE_GROUPS将摘要与最旧和最新进程的信息结合在一起。

SELECT

Results

WITH
CTE_RN
AS
(
    SELECT
        PROCESS.DATE_TIME
        ,PROCESS._ID AS ID_PROCESS
        ,PROCESS.ID_CLIENT
        ,PROCESS_VALUES.ID_ITEM
        ,PROCESS_VALUES.ITEM_LIFE
        ,PROCESS_VALUES.ITEM_VALUE_1
        ,PROCESS_VALUES.ITEM_VALUE_2
        ,PROCESS_VALUES.ITEM_VALUE_3
        ,ROW_NUMBER() OVER (PARTITION BY PROCESS_VALUES.ID_ITEM, PROCESS_VALUES.ITEM_LIFE 
                            ORDER BY PROCESS.DATE_TIME ASC) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY PROCESS_VALUES.ID_ITEM, PROCESS_VALUES.ITEM_LIFE 
                            ORDER BY PROCESS.DATE_TIME DESC) AS rn2
    FROM
        PROCESS
        INNER JOIN PROCESS_VALUES ON PROCESS._ID = PROCESS_VALUES.ID_PROCESS
)
,CTE_OLDEST
AS
(
    SELECT
        ID_ITEM
        ,ITEM_LIFE
        ,ID_PROCESS
        ,ID_CLIENT
        ,DATE_TIME
    FROM CTE_RN
    WHERE rn1 = 1
)
,CTE_NEWEST
AS
(
    SELECT
        ID_ITEM
        ,ITEM_LIFE
        ,ID_PROCESS
        ,ID_CLIENT
        ,DATE_TIME
    FROM CTE_RN
    WHERE rn2 = 1
)
,CTE_GROUPS
AS
(
  SELECT
      ID_ITEM
      ,ITEM_LIFE
      ,COUNT(ID_ITEM) AS TOTAL_ITEM_PROCESS
      ,MIN(DATE_TIME) AS OLDEST_PROCESS
      ,MAX(DATE_TIME) AS NEWEST_PROCESS
      ,MAX(GREATEST(ITEM_VALUE_1, ITEM_VALUE_2, ITEM_VALUE_3)) AS MAX_ITEM_VALUE
      ,MIN(LEAST(ITEM_VALUE_1, ITEM_VALUE_2, ITEM_VALUE_3)) AS MIN_ITEM_VALUE
  FROM CTE_RN
  GROUP BY
      ID_ITEM, ITEM_LIFE
)
SELECT
    CTE_GROUPS.ID_ITEM
    ,CTE_GROUPS.ITEM_LIFE
    ,CTE_GROUPS.TOTAL_ITEM_PROCESS
    ,CTE_GROUPS.MAX_ITEM_VALUE
    ,CTE_GROUPS.MIN_ITEM_VALUE
    ,CTE_OLDEST.DATE_TIME AS OLDEST_DATE_TIME
    ,CTE_OLDEST.ID_PROCESS AS OLDEST_ID_PROCESS
    ,CTE_OLDEST.ID_CLIENT AS OLDEST_ID_CLIENT
    ,CTE_NEWEST.DATE_TIME AS NEWEST_DATE_TIME
    ,CTE_NEWEST.ID_PROCESS AS NEWEST_ID_PROCESS
    ,CTE_NEWEST.ID_CLIENT AS NEWEST_ID_CLIENT
FROM
    CTE_GROUPS
    INNER JOIN CTE_OLDEST
        ON  CTE_OLDEST.ID_ITEM = CTE_GROUPS.ID_ITEM
        AND CTE_OLDEST.ITEM_LIFE = CTE_GROUPS.ITEM_LIFE
    INNER JOIN CTE_NEWEST
        ON  CTE_NEWEST.ID_ITEM = CTE_GROUPS.ID_ITEM
        AND CTE_NEWEST.ITEM_LIFE = CTE_GROUPS.ITEM_LIFE
ORDER BY 
    ID_ITEM, ITEM_LIFE

结果与您的预期结果不完全相符,但是我相信这个问题有错别字。您说过您想要最旧和最新的“每件商品每项”,而项230、231、232每件生活中只有一个过程,因此它们的最旧和最新过程将相同。

如您在查询结果中所见。

答案 1 :(得分:0)

我认为您对Group By的想法感到困惑,但是请尝试以下操作:

 SELECT
  PV.ID_ITEM                                                       AS ID_ITEM,
 pv.id_process as PROCESS_ID,
 p.id_client as CLIENT_ID,
  PV.ITEM_LIFE                                                     AS LIFE,
  COUNT(PV.ID_ITEM)                                                AS TOTAL_ITEM_PROCESS,
  MIN(P.DATE_TIME)                                                AS OLDEST_PROCESS,
  MAX(P.DATE_TIME)                                                AS NEWEST_PROCESS,
  MAX(GREATEST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3)) AS MAX_ITEM_VALUE,
  MIN(LEAST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3))    AS MIN_ITEM_VALUE
FROM PROCESS P
  JOIN PROCESS_VALUES PV ON P._ID = PV.ID_PROCESS
GROUP BY PV.ID_ITEM, PV.ITEM_LIFE,pv.id_process,p.id_client;

Group By如果您想显示一个客户端正在运行多少个进程,则很有用。

答案 2 :(得分:0)

我的答案使用窗口函数来计算值而无需分组依据。我认为您提供的输出有误。但是,您可以根据自己的喜好修改此查询,以获取所需的值。

例如,如果您需要所有记录中最低的process_id,请删除该列的partition by子句

select *
from (with prc as (
  select a.*,
  GREATEST(a.ITEM_VALUE_1, a.ITEM_VALUE_2, a.ITEM_VALUE_3) as great_pv,
  LEAST(a.ITEM_VALUE_1, a.ITEM_VALUE_2, a.ITEM_VALUE_3) as least_pv
  from PROCESS_VALUES a
)
SELECT
  PV.ID_ITEM                                                       AS ID_ITEM,
  PV.ITEM_LIFE                                                     AS LIFE,
  count(*) over(partition by PV.ID_ITEM, PV.ITEM_LIFE ) as "TOTAL PROCESSES PER ITEM PER LIFE",
  row_number() over(partition by PV.ID_ITEM, PV.ITEM_LIFE ) as rn,
  first_value(P.DATE_TIME) over(partition by PV.ID_ITEM, PV.ITEM_LIFE  order by P.DATE_TIME) as "OLDEST PROCESS PER ITEM PER LIFE",
  last_value(P.DATE_TIME) over(partition by PV.ID_ITEM, PV.ITEM_LIFE  order by P.DATE_TIME) as "NEWEST PROCESS PER ITEM PER LIFE",
  last_value(pv.great_pv) over(partition by PV.ID_ITEM, PV.ITEM_LIFE  order by pv.great_pv) as "MAX_ITEM_VALUE",
  first_value(pv.great_pv) over(partition by PV.ID_ITEM, PV.ITEM_LIFE  order by pv.great_pv) as "MIN_ITEM_VALUE",
  first_value(pv.ID_PROCESS) over(partition by PV.ID_ITEM, PV.ITEM_LIFE  order by P.DATE_TIME) "PROCESS ID OF THE OLDEST PROCESS",
  last_value(pv.ID_PROCESS) over(partition by PV.ID_ITEM, PV.ITEM_LIFE  order by P.DATE_TIME) "PROCESS ID OF THE NEWEST PROCESS",
  first_value(p.ID_CLIENT) over(partition by PV.ID_ITEM, PV.ITEM_LIFE  order by P.DATE_TIME) "CLIENT ID OF THE OLDEST PROCESS",
  last_value(p.ID_CLIENT) over(partition by PV.ID_ITEM, PV.ITEM_LIFE  order by P.DATE_TIME) "CLIENT ID OF THE NEWEST PROCESS"
FROM PROCESS P
  JOIN prc PV ON P._ID = PV.ID_PROCESS
) inn
where rn = 1
order by ID_ITEM, life