我有一张表来保存过程。
每个过程都是由项目组成的,每个项目都有其在过程中收集的值。
进程由客户端执行。
以下是带有伪数据的示例数据库方案:http://sqlfiddle.com/#!15/36af4
我需要从这些表中提取一些信息:
我还需要查找每个项目的最新和最旧流程的流程ID和客户ID:
请注意,对于特定寿命中的特定项目,最早的过程的进程ID与该寿命中该项目的最小值的过程ID不匹配。
我需要每一项生活中的所有这些信息。一个项目可以在不同的客户端中具有一个进程,因此我无法按客户端分组,因为这可能导致该项目的重复。流程也是如此,因为一个项目可以存在于不同的流程中,我也无法按流程分组。
这是我可以为自己做的最远的事情:
SELECT
PV.ID_ITEM AS ID_ITEM,
PV.ITEM_LIFE AS LIFE,
COUNT(PV.ID_ITEM) AS TOTAL_ITEM_PROCESS,
MIN(P.DATE_TIME) AS OLDEST_PROCESS,
MAX(P.DATE_TIME) AS NEWEST_PROCESS,
MAX(GREATEST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3)) AS MAX_ITEM_VALUE,
MIN(LEAST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3)) AS MIN_ITEM_VALUE
FROM PROCESS P
JOIN PROCESS_VALUES PV ON P._ID = PV.ID_PROCESS
GROUP BY PV.ID_ITEM, PV.ITEM_LIFE;
但是我不知道如何将最旧和最新进程的客户端和进程ID添加到此查询中,而不必在group by子句中添加这些相同的列。例如,如果我在组中添加客户端ID,则如果在不同客户端中有针对某些项目的流程,则某些项目将重复。
我们不能使用MAX或MIN来获取进程ID,因为返回的ID与收集的时间戳不匹配。较高的ID并不总是具有最新的时间戳。
对于提琴中提供的数据,应为输出:
+---------+-----------+-----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+
| ITEM ID | ITEM LIFE | TOTAL PROCESSES PER ITEM PER LIFE | OLDEST PROCESS PER ITEM PER LIFE | NEWEST PROCESS PER ITEM PER LIFE | MAX ITEM VALUE PER ITEM PER LIFE | MIN ITEM VALUE PER ITEM PER LIFE | PROCESS ID OF THE OLDEST PROCESS | PROCESS ID OF THE NEWEST PROCESS | CLIENT ID OF THE OLDEST PROCESS | CLIENT ID OF THE NEWEST PROCESS |
+---------+-----------+-----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+
| 230 | 1 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 6.5 | 1.5 | 1 | 2 | 100 | 100 |
| 230 | 2 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 5.5 | 2.5 | 1 | 2 | 100 | 100 |
| 231 | 1 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 7.5 | 1.5 | 1 | 2 | 100 | 100 |
| 231 | 2 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 10.8 | 4.5 | 1 | 2 | 100 | 100 |
| 232 | 1 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 5.6 | 0.5 | 1 | 2 | 100 | 100 |
| 232 | 2 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 2.5 | 25.5 | 1 | 2 | 100 | 100 |
| 530 | 1 | 2 | '2018-01-05 13:00:00' | '2018-01-06 13:00:00' | 11.5 | 1.5 | 4 | 3 | 400 | 300 |
| 531 | 1 | 2 | '2018-01-05 13:00:00' | '2018-01-06 13:00:00' | 9.5 | 1.5 | 4 | 3 | 400 | 300 |
| 532 | 1 | 2 | '2018-01-05 13:00:00' | '2018-01-06 13:00:00' | 13.5 | 4.5 | 4 | 3 | 400 | 300 |
+---------+-----------+-----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+
如何将更多列添加到查询中而无需将它们放在分组依据中?
我们使用PostgreSQL。
答案 0 :(得分:5)
这似乎是一个常见的问题ObservableCollection<Attribut>()
,但略有扭曲。
在这里,我使用top-n-per-group
方法。另一种方法是使用横向连接。哪种方法更快取决于数据分布。
您可以在dba.se Retrieving n rows per group上阅读非常详细的答案。 (该问题是针对SQL Server的,但是Postgres具有所有相同的功能,因此所有答案都适用于Postgres。)
我调整了样本数据,并使所有ROW_NUMBER
不同,以便能够在结果中看到它们。
ID_CLIENT
查询
我使用CTE编写了查询以使其可读。如果您愿意,可以将它们全部内联到一个庞大的查询中。您应该尝试测试运行速度更快的东西。
CREATE TABLE PROCESS (
_ID BIGINT,
DATE_TIME TIMESTAMP WITH TIME ZONE,
ID_CLIENT BIGINT
);
CREATE TABLE PROCESS_VALUES (
ID_PROCESS BIGINT,
ID_ITEM BIGINT,
ITEM_LIFE INTEGER,
ITEM_VALUE_1 REAL,
ITEM_VALUE_2 REAL,
ITEM_VALUE_3 REAL
);
INSERT INTO PROCESS VALUES(1, '2018-01-01 10:00:00', 100);
INSERT INTO PROCESS VALUES(2, '2018-01-02 10:00:00', 200);
INSERT INTO PROCESS VALUES(3, '2018-01-06 13:00:00', 300);
INSERT INTO PROCESS VALUES(4, '2018-01-05 13:00:00', 400);
INSERT INTO PROCESS_VALUES VALUES(1, 230, 1, 5.5, 6.5, 1.5);
INSERT INTO PROCESS_VALUES VALUES(1, 231, 1, 1.5, 7.5, 3.5);
INSERT INTO PROCESS_VALUES VALUES(1, 232, 1, 5.6, 3.5, 0.5);
INSERT INTO PROCESS_VALUES VALUES(2, 230, 2, 5.5, 2.5, 4.5);
INSERT INTO PROCESS_VALUES VALUES(2, 231, 2, 10.8, 6.5, 4.5);
INSERT INTO PROCESS_VALUES VALUES(2, 232, 2, 25.5, 6.5, 2.5);
INSERT INTO PROCESS_VALUES VALUES(3, 530, 1, 1.5, 6.5, 8.5);
INSERT INTO PROCESS_VALUES VALUES(3, 531, 1, 3.5, 6.5, 1.5);
INSERT INTO PROCESS_VALUES VALUES(3, 532, 1, 6.5, 7.0, 4.5);
INSERT INTO PROCESS_VALUES VALUES(4, 530, 1, 1.5, 11.5, 4.5);
INSERT INTO PROCESS_VALUES VALUES(4, 531, 1, 9.5, 8.5, 1.5);
INSERT INTO PROCESS_VALUES VALUES(4, 532, 1, 5.5, 13.5, 4.5);
是原始的两个表,并与两组行号连接在一起,一组用于最旧的进程,一组用于最新的(这就是CTE_RN
的作用);每个项目,每个生命(这就是ORDER BY DATE_TIME ASC/DESC
的目的。)
PARTITION BY
仅保留最旧的进程。
CTE_OLDEST
仅保留最新的进程。
CTE_NEWEST
是您对计算所有汇总的问题的查询。
最终CTE_GROUPS
将摘要与最旧和最新进程的信息结合在一起。
SELECT
Results :
WITH
CTE_RN
AS
(
SELECT
PROCESS.DATE_TIME
,PROCESS._ID AS ID_PROCESS
,PROCESS.ID_CLIENT
,PROCESS_VALUES.ID_ITEM
,PROCESS_VALUES.ITEM_LIFE
,PROCESS_VALUES.ITEM_VALUE_1
,PROCESS_VALUES.ITEM_VALUE_2
,PROCESS_VALUES.ITEM_VALUE_3
,ROW_NUMBER() OVER (PARTITION BY PROCESS_VALUES.ID_ITEM, PROCESS_VALUES.ITEM_LIFE
ORDER BY PROCESS.DATE_TIME ASC) AS rn1
,ROW_NUMBER() OVER (PARTITION BY PROCESS_VALUES.ID_ITEM, PROCESS_VALUES.ITEM_LIFE
ORDER BY PROCESS.DATE_TIME DESC) AS rn2
FROM
PROCESS
INNER JOIN PROCESS_VALUES ON PROCESS._ID = PROCESS_VALUES.ID_PROCESS
)
,CTE_OLDEST
AS
(
SELECT
ID_ITEM
,ITEM_LIFE
,ID_PROCESS
,ID_CLIENT
,DATE_TIME
FROM CTE_RN
WHERE rn1 = 1
)
,CTE_NEWEST
AS
(
SELECT
ID_ITEM
,ITEM_LIFE
,ID_PROCESS
,ID_CLIENT
,DATE_TIME
FROM CTE_RN
WHERE rn2 = 1
)
,CTE_GROUPS
AS
(
SELECT
ID_ITEM
,ITEM_LIFE
,COUNT(ID_ITEM) AS TOTAL_ITEM_PROCESS
,MIN(DATE_TIME) AS OLDEST_PROCESS
,MAX(DATE_TIME) AS NEWEST_PROCESS
,MAX(GREATEST(ITEM_VALUE_1, ITEM_VALUE_2, ITEM_VALUE_3)) AS MAX_ITEM_VALUE
,MIN(LEAST(ITEM_VALUE_1, ITEM_VALUE_2, ITEM_VALUE_3)) AS MIN_ITEM_VALUE
FROM CTE_RN
GROUP BY
ID_ITEM, ITEM_LIFE
)
SELECT
CTE_GROUPS.ID_ITEM
,CTE_GROUPS.ITEM_LIFE
,CTE_GROUPS.TOTAL_ITEM_PROCESS
,CTE_GROUPS.MAX_ITEM_VALUE
,CTE_GROUPS.MIN_ITEM_VALUE
,CTE_OLDEST.DATE_TIME AS OLDEST_DATE_TIME
,CTE_OLDEST.ID_PROCESS AS OLDEST_ID_PROCESS
,CTE_OLDEST.ID_CLIENT AS OLDEST_ID_CLIENT
,CTE_NEWEST.DATE_TIME AS NEWEST_DATE_TIME
,CTE_NEWEST.ID_PROCESS AS NEWEST_ID_PROCESS
,CTE_NEWEST.ID_CLIENT AS NEWEST_ID_CLIENT
FROM
CTE_GROUPS
INNER JOIN CTE_OLDEST
ON CTE_OLDEST.ID_ITEM = CTE_GROUPS.ID_ITEM
AND CTE_OLDEST.ITEM_LIFE = CTE_GROUPS.ITEM_LIFE
INNER JOIN CTE_NEWEST
ON CTE_NEWEST.ID_ITEM = CTE_GROUPS.ID_ITEM
AND CTE_NEWEST.ITEM_LIFE = CTE_GROUPS.ITEM_LIFE
ORDER BY
ID_ITEM, ITEM_LIFE
结果与您的预期结果不完全相符,但是我相信这个问题有错别字。您说过您想要最旧和最新的“每件商品每项”,而项230、231、232每件生活中只有一个过程,因此它们的最旧和最新过程将相同。
如您在查询结果中所见。
答案 1 :(得分:0)
我认为您对Group By
的想法感到困惑,但是请尝试以下操作:
SELECT
PV.ID_ITEM AS ID_ITEM,
pv.id_process as PROCESS_ID,
p.id_client as CLIENT_ID,
PV.ITEM_LIFE AS LIFE,
COUNT(PV.ID_ITEM) AS TOTAL_ITEM_PROCESS,
MIN(P.DATE_TIME) AS OLDEST_PROCESS,
MAX(P.DATE_TIME) AS NEWEST_PROCESS,
MAX(GREATEST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3)) AS MAX_ITEM_VALUE,
MIN(LEAST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3)) AS MIN_ITEM_VALUE
FROM PROCESS P
JOIN PROCESS_VALUES PV ON P._ID = PV.ID_PROCESS
GROUP BY PV.ID_ITEM, PV.ITEM_LIFE,pv.id_process,p.id_client;
Group By
如果您想显示一个客户端正在运行多少个进程,则很有用。
答案 2 :(得分:0)
我的答案使用窗口函数来计算值而无需分组依据。我认为您提供的输出有误。但是,您可以根据自己的喜好修改此查询,以获取所需的值。
例如,如果您需要所有记录中最低的process_id,请删除该列的partition by子句
select *
from (with prc as (
select a.*,
GREATEST(a.ITEM_VALUE_1, a.ITEM_VALUE_2, a.ITEM_VALUE_3) as great_pv,
LEAST(a.ITEM_VALUE_1, a.ITEM_VALUE_2, a.ITEM_VALUE_3) as least_pv
from PROCESS_VALUES a
)
SELECT
PV.ID_ITEM AS ID_ITEM,
PV.ITEM_LIFE AS LIFE,
count(*) over(partition by PV.ID_ITEM, PV.ITEM_LIFE ) as "TOTAL PROCESSES PER ITEM PER LIFE",
row_number() over(partition by PV.ID_ITEM, PV.ITEM_LIFE ) as rn,
first_value(P.DATE_TIME) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) as "OLDEST PROCESS PER ITEM PER LIFE",
last_value(P.DATE_TIME) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) as "NEWEST PROCESS PER ITEM PER LIFE",
last_value(pv.great_pv) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by pv.great_pv) as "MAX_ITEM_VALUE",
first_value(pv.great_pv) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by pv.great_pv) as "MIN_ITEM_VALUE",
first_value(pv.ID_PROCESS) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) "PROCESS ID OF THE OLDEST PROCESS",
last_value(pv.ID_PROCESS) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) "PROCESS ID OF THE NEWEST PROCESS",
first_value(p.ID_CLIENT) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) "CLIENT ID OF THE OLDEST PROCESS",
last_value(p.ID_CLIENT) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) "CLIENT ID OF THE NEWEST PROCESS"
FROM PROCESS P
JOIN prc PV ON P._ID = PV.ID_PROCESS
) inn
where rn = 1
order by ID_ITEM, life