PostgreSQL-将具有多个属性的多行合并为一行?

时间:2018-09-12 10:33:16

标签: sql postgresql greatest-n-per-group greenplum

我有一个这样的表:

DATE        ID    ScoreA    ScoreB    ScoreC
20180101    001   91        92        25
20180101    002   81        82        35
20180101    003   71        52        45
20180102    001   82        15        66
20180102    002   69        67        77
...
20180131    003   88        65        73

以一个月的数据为例,我想将它们汇总为MAX和MIN得分的报告,每个ID仅包含一行。就像:

ID    ScoreA       Date_A              ScoreB        Date_B            ...
001   MAX(ScoreA)  MAX(ScoreA).DATE    MAX(ScoreB)   MAX(ScoreB).DATE  ...
002   MAX(ScoreA)  MAX(ScoreA).DATE    MAX(ScoreB)   MAX(ScoreB).DATE  ...
003   MAX(ScoreA)  MAX(ScoreA).DATE    MAX(ScoreB)   MAX(ScoreB).DATE  ...

其中MAX(ScoreA).DATE表示相应的MAX或MIN得分出现时的DATE(如果MAX得分出现在多个日期,则随机选择一个)

与常见的合并行情况不同,它同时涉及多个列。并且由于将有{strong> ID个和Score中的数百个(我是说ScroeA ScroreB ... ScoreZ ... Score1 Score2 ... Score100 ...),我希望避免使用消耗操作,例如JOIN表。有什么好主意吗?

3 个答案:

答案 0 :(得分:2)

如果您想避免加入,我会提供这样的构造

WITH cte AS (
    SELECT DATE, ID, ScoreA, ScoreB, ScoreC,
        row_number() over (partition by ID order by ScoreA desc) rnA,
        row_number() over (partition by ID order by ScoreB desc) rnB,
        row_number() over (partition by ID order by ScoreC desc) rnC,
    FROM ...
    WHERE DATE BETWEEN ... AND ...
), ids AS (
    SELECT DISTINCT ID FROM cte
)
SELECT ID, 
    (SELECT ScoreA FROM cte t2 WHERE t2.ID = t.ID AND rnA = 1) ScoreA, 
    (SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rnA = 1) Date_A,
    (SELECT ScoreB FROM cte t2 WHERE t2.ID = t.ID AND rnB = 1) ScoreB, 
    (SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rnB = 1) Date_B,
    (SELECT ScoreC FROM cte t2 WHERE t2.ID = t.ID AND rnC = 1) ScoreC, 
    (SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rnC = 1) Date_C
FROM ids t

当您需要日期或其他具有最大值/最小值的属性时,使用行编号而不是聚合函数是合理的:row_number() over (...) as rn后跟条件rn = 1

UPD

@TaurusDang想要生成代码后,就有一种解决方案允许postgres完成几乎所有工作:

WITH cols AS
(
    SELECT column_name
    FROM information_schema.columns
    WHERE table_schema = 'your_schema'
      AND table_name   = 'your_table'
      AND column_name like 'Score%'
)
-- first part: rows for cte subquery
SELECT ',row_number() over (partition by ID order by ' || column_name || ' desc) rn' || column_name
FROM cols
UNION ALL
-- second part: rows for final query
SELECT ',(SELECT ' || column_name || ' FROM cte t2 WHERE t2.ID = t.ID AND rn' || column_name || ' = 1) ' || column_name || ', (SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rn' || column_name || ' = 1) Date_' || column_name
FROM cols

只需将生成的行复制到初始查询中即可:将前半部分复制到cte,将后半部分复制到主查询

答案 1 :(得分:0)

尝试一下

with max_score as
(
    Select    distinct id
        , max(ScoreA) over( partition by id ) as max_ScoreA
        , max(ScoreB) over( partition by id ) as max_ScoreB
        , max(ScoreC) over( partition by id ) as max_Scorec
    from TABLE_NAME
)
Select 
    cte.id
    , max_ScoreA, tbl_a.DATE
    , max_ScoreB, tbl_b.DATE
    , max_ScoreC, tbl_c.DATE
from 
max_score cte
join TABLE_NAME tbl_a
on cte.id = tbl_a.id
and cte.max_ScoreA = tbl_a.ScoreA
join TABLE_NAME tbl_b
on cte.id = tbl_b.id
and cte.max_ScoreB = tbl_b.ScoreB
join TABLE_NAME tbl_c
on cte.id = tbl_c.id
and cte.max_ScoreC = tbl_c.ScoreC
order by 1

答案 2 :(得分:0)

这是另一个代码示例,它将为您提供所需的所有数据:

select *
from (select
        distinct on (id) id,
        first_value(scorea) over w as a_min,
        last_value(scorea) over w as a_max,
        first_value(date) over w as a_min_d,
        last_value(date) over w as a_max_d
    from the_table
    window w as (partition by id order by scorea)
    order by 1,3 desc) a
join (select
        distinct on (id) id,
        first_value(scoreb) over w as b_min,
        last_value(scoreb) over w as b_max,
        first_value(date) over w as b_min_d,
        last_value(date) over w as b_max_d
    from the_table
    window w as (partition by id order by scoreb)
    order by 1,3 desc) b using(id)
join (select
        distinct on (id) id,
        first_value(scorec) over w as c_min,
        last_value(scorec) over w as c_max,
        first_value(date) over w as c_min_d,
        last_value(date) over w as c_max_d
    from the_table
    window w as (partition by id order by scorec)
    order by 1,3 desc) c using(id)

请注意,有3个单独的子查询,每个得分列一个。关于windowing-functions and partitions,这里有一些 magic ,对读者有帮助。这里有一个棘手的部分是,如果放在同一查询中(至少在我的9.3.22页上),各个分区会相互干扰。