从每组的第一行和最后一行获取值

时间:2014-08-06 21:07:05

标签: sql postgresql group-by aggregate greatest-n-per-group

我是Postgres的新手,来自MySQL并希望你们中的一个人能够帮助我。

我有一个包含三列的表:nameweekvalue。此表记录了名称,记录高度的周数以及高度值。 像这样:

Name  |  Week  | Value
------+--------+-------
John  |  1     | 9
Cassie|  2     | 5
Luke  |  6     | 3
John  |  8     | 14
Cassie|  5     | 7
Luke  |  9     | 5
John  |  2     | 10
Cassie|  4     | 4
Luke  |  7     | 4

我想要的是每个用户在最小周和最大周的值的列表。像这样:

Name  |minWeek | Value |maxWeek | value
------+--------+-------+--------+-------
John  |  1     | 9     | 8      | 14
Cassie|  2     | 5     | 5      | 7
Luke  |  6     | 3     | 9      | 5

在Postgres中,我使用此查询:

select name, week, value
from table t
inner join(
select name, min(week) as minweek
from table
group by name)
ss on t.name = ss.name and t.week = ss.minweek
group by t.name
;

但是,我收到错误:

  

列“w.week”必须出现在GROUP BY子句中或用于聚合函数
  位置:20

这对我来说在MySQL中运行正常,所以我想知道我在这里做错了什么?

2 个答案:

答案 0 :(得分:11)

有各种更简单,更快捷的方式。

2x DISTINCT ON

SELECT *
FROM  (
   SELECT DISTINCT ON (name)
          name, week AS first_week, value AS first_val
   FROM   tbl
   ORDER  BY name, week
   ) f
JOIN (
   SELECT DISTINCT ON (name)
          name, week AS last_week, value AS last_val
   FROM   tbl
   ORDER  BY name, week DESC
   ) l USING (name);

或更短:

SELECT *
FROM  (SELECT DISTINCT ON (1) name, week AS first_week, value AS first_val
       FROM   tbl ORDER BY 1,2) f
JOIN  (SELECT DISTINCT ON (1) name, week AS last_week, value AS last_val
       FROM   tbl ORDER BY 1,2 DESC) l USING (name);

简单易懂。在我的测试中也是最快的。 DISTINCT ON的详细说明:

first_value()复合类型

aggregate functions min() or max()不接受复合类型作为输入。您必须创建自定义聚合函数(这并不难) 但window functions first_value() and last_value() 执行。在此基础上,我们可以设计一个非常简单的解决方案:

简单查询

SELECT DISTINCT ON (name)
       name, week AS first_week, value AS first_value
     ,(first_value((week, value)) OVER (PARTITION BY name
                                        ORDER BY week DESC))::text AS l
FROM   tbl t
ORDER  BY name, week;

输出包含所有数据,但上周的值被填充到匿名记录中。您可能需要分解值。

分解结果,机会使用表类型

为此,我们需要一个众所周知的类型,用系统注册所包含元素的类型。适应的表定义将允许直接机会使用表类型本身:

CREATE TABLE tbl (week int, value int, name text) -- note optimized column order

weekvalue排在第一位。

SELECT (l).name, first_week, first_val
     , (l).week AS last_week, (l).value AS last_val
FROM (
   SELECT DISTINCT ON (name)
          week AS first_week, value AS first_val
         ,first_value(t) OVER (PARTITION BY name ORDER BY week DESC) AS l
   FROM   tbl t
   ORDER  BY name, week
   ) sub;

来自用户定义的行类型

的分解结果

然而,在大多数情况下,这可能是不可能的。只需使用CREATE TYPE(永久)或CREATE TEMP TABLE中的用户定义类型(用于临时使用):

CREATE TEMP TABLE nv(last_week int, last_val int);  -- register composite type

SELECT name, first_week, first_val, (l).last_week, (l).last_val
FROM (
   SELECT DISTINCT ON (name)
          name, week AS first_week, value AS first_val
         ,first_value((week, value)::nv) OVER (PARTITION BY name
                                               ORDER BY week DESC) AS l
   FROM   tbl t
   ORDER  BY name, week
   ) sub;

在Postgres 9.3的本地测试中,有一个类似的50k行表,这些查询中的每个都比当前接受的答案快得多。使用EXPLAIN ANALYZE进行测试。

SQL Fiddle显示全部。

答案 1 :(得分:6)

这有点痛苦,因为Postgres有很好的窗口函数first_value()last_value(),但这些不是聚合函数。所以,这是一种方式:

select t.name, min(t.week) as minWeek, max(firstvalue) as firstvalue,
       max(t.week) as maxWeek, max(lastvalue) as lastValue
from (select t.*, first_value(value) over (partition by name order by week) as firstvalue,
             last_value(value) over (partition by name order by week) as lastvalue
      from table t
     ) t
group by t.name;