Postgresql - 如何从每个月的最后一条记录中获取价值

时间:2017-06-02 04:06:33

标签: sql postgresql

我有这样的观点:

 Year | Month | Week | Category | Value |
 2017 | 1     | 1    | A     | 1
 2017 | 1     | 1    | B     | 2
 2017 | 1     | 1    | C     | 3
 2017 | 1     | 2    | A     | 4
 2017 | 1     | 2    | B     | 5
 2017 | 1     | 2    | C     | 6
 2017 | 1     | 3    | A     | 7
 2017 | 1     | 3    | B     | 8
 2017 | 1     | 3    | C     | 9
 2017 | 1     | 4    | A     | 10
 2017 | 1     | 4    | B     | 11
 2017 | 1     | 4    | C     | 12
 2017 | 2     | 5    | A     | 1
 2017 | 2     | 5    | B     | 2
 2017 | 2     | 5    | C     | 3
 2017 | 2     | 6    | A     | 4
 2017 | 2     | 6    | B     | 5
 2017 | 2     | 6    | C     | 6
 2017 | 2     | 7    | A     | 7
 2017 | 2     | 7    | B     | 8
 2017 | 2     | 7    | C     | 9
 2017 | 2     | 8    | A     | 10
 2017 | 2     | 8    | B     | 11
 2017 | 2     | 8    | C     | 12

我需要制作一个新视图,该视图需要显示值的平均值列(让我们称之为avg_val)和来自该月最大周的值(max_val_of_month)。例如:1月的最大周数为4,因此A类的值为10.或者类似的事情要明确:

 Year | Month | Category | avg_val | max_val_of_month
 2017 | 1     | A        | 5.5     | 10
 2017 | 1     | B        | 6.5     | 11
 2017 | 1     | C        | 7.5     | 12
 2017 | 2     | A        | 5.5     | 10
 2017 | 2     | B        | 6.5     | 11
 2017 | 2     | C        | 7.5     | 12

我使用窗口功能,按年,月,类别划分以获得平均值。但是我怎样才能得到每月最大周的价值呢?

4 个答案:

答案 0 :(得分:2)

假设您需要一个月的平均值和最大周的值而不是每月的最大值

SELECT year, month, category, avg_val, value max_week_val
  FROM (
    SELECT *,
           AVG(value) OVER (PARTITION BY year, month, category) avg_val,
           ROW_NUMBER() OVER (PARTITION BY year, month, category ORDER BY week DESC) rn
      FROM view1
  ) q
 WHERE rn = 1
 ORDER BY year, month, category

或更详细的没有窗口函数的版本

SELECT q.year, q.month, q.category, q.avg_val, v.value max_week_val
  FROM (
    SELECT year, month, category, avg(value) avg_val, MAX(week) max_week
      FROM view1
     GROUP BY year, month, category
  ) q JOIN view1 v
    ON q.year = v.year
   AND q.month = v.month
   AND q.category = v.category
   AND q.max_week = v.week
 ORDER BY year, month, category

以下是两个查询的dbfiddle演示

答案 1 :(得分:0)

with data (yr, mnth, wk, cat, val) as
(
  -- begin test data
  select  2017 , 1     , 1    , 'A'     , 1 from dual union all
  select  2017 , 1     , 1    , 'B'     , 2 from dual union all
  select  2017 , 1     , 1    , 'C'     , 3 from dual union all
  select  2017 , 1     , 2    , 'A'     , 4 from dual union all
  select  2017 , 1     , 2    , 'B'     , 5 from dual union all
  select  2017 , 1     , 2    , 'C'     , 6 from dual union all
  select  2017 , 1     , 3    , 'A'     , 7 from dual union all
  select  2017 , 1     , 3    , 'B'     , 8 from dual union all
  select  2017 , 1     , 3    , 'C'     , 9 from dual union all
  select  2017 , 1     , 4    , 'A'     , 10 from dual union all
  select  2017 , 1     , 4    , 'B'     , 11 from dual union all
  select  2017 , 1     , 4    , 'C'     , 12 from dual union all
  select  2017 , 2     , 5    , 'A'     , 1 from dual union all
  select  2017 , 2     , 5    , 'B'     , 2 from dual union all
  select  2017 , 2     , 5    , 'C'     , 3 from dual union all
  select  2017 , 2     , 6    , 'A'     , 4 from dual union all
  select  2017 , 2     , 6    , 'B'     , 5 from dual union all
  select  2017 , 2     , 6    , 'C'     , 6 from dual union all
  select  2017 , 2     , 7    , 'A'     , 7 from dual union all
  select  2017 , 2     , 8    , 'A'     , 10 from dual union all
  select  2017 , 2     , 8    , 'B'     , 11 from dual union all
  select  2017 , 2     , 7    , 'B'     , 8 from dual union all
  select  2017 , 2     , 7    , 'C'     , 9 from dual union all
  select  2018 , 2     , 7    , 'C'     , 9 from dual union all
  select  2017 , 2     , 8    , 'C'     , 12 from dual
  -- end test data
)
select * from 
(
  select
    -- data.*: all columns of the data table
    data.*, 
    -- avrg: partition by a combination of year,month and category to work out -
    --       the avg for each category in a month of a year
    avg(val) over (partition by yr, mnth, cat) avrg, 
    -- mwk: partition by year and month to work out -
    --      the max week of a month in a year
    max(wk) over (partition by yr, mnth) mwk 
  from 
    data
)
-- as OP's interest is in the max week of each month of a year, -
-- "wk" column value is matched against 
--      the derived column "mwk"
where wk = mwk 
order by yr,mnth,cat;

答案 2 :(得分:0)

这是我的版本。

感谢@peterm指出我val_from_max_week_of_month的先前错误值。所以,我纠正了它:

SELECT 
    a.Year,
    a.Month,
    a.Category,
    max(a.Week) AS max_week,
    AVG(a.Value) AS avg_val,
    (
        SELECT b.Value 
        FROM decades AS b
        WHERE
            b.Year = a.Year AND
            b.Month = a.Month AND  
            b.Week = max(a.Week) AND 
            b.Category = a.Category
    ) AS val_from_max_week_of_month
FROM decades AS a
GROUP BY 
    a.Year,
    a.Month,
    a.Category
;

新结果:

enter image description here

答案 3 :(得分:0)

首先,您可能需要检查,您如何处理1月份的第一周。如果1月1日不是星期一,那么有几种解释&并不是每个人都适合这里的解决方案。您需要使用:

  • ISO week concept,即。 week列应该包含ISO周& year列应保持ISO年份(相当于周年)。注意:在这个概念中,1月1日实际上有时属于上一年
  • 使用您自己的概念,如果1月1日不是星期一,则将一年中的第一周“拆分”为两个。

注意:如果(在您的表格中)1月的第一周可能是52或53,则以下解决方案将无效。

鉴于avg_val只是一个简单的聚合,而max_val_of_month可以使用典型的查询进行计算。 It has a lot of possible solutions in PostgreSQL, with varying performance.幸运的是,您的查询自然会有一个容易确定的选择性:您总是需要(大约)四分之一的数据。

通常的获胜者(表现)是:

(但这些并不令人惊讶,因为这些2应该越来越多,因为你需要更多的原始数据。)

带有array_agg()变体的

order by

select   year, month, category, avg(value) avg_val,
         (array_agg(value order by week desc))[1] max_val_of_month
from     table_name
group by year, month, category;

distinct on变体:

select   distinct on (year, month, category) year, month, category,
         avg(value) over (partition by year, month, category) avg_val,
         value max_val_of_month
from     table_name
order by year, month, category, week desc;

纯窗口函数变体也不错:

row_number()变体:

select year, month, category, avg_val, max_val_of_month
from   (select year, month, category, value max_val_of_month,
               avg(value) over (partition by year, month, category) avg_val,
               row_number() over (partition by year, month, category order by week desc) rn
        from   table_name) w
where  rn = 1;

LATERAL变体仅适用于索引:

LATERAL变体:

create index idx_table_name_year_month_category_week_desc
  on table_name(year, month, category, week desc);

select     year, month, category,
           avg(value) avg_val,
           max_val_of_month
from       table_name t
cross join lateral (select   value max_val_of_month
                    from     table_name
                    where    (year, month, category) = (t.year, t.month, t.category)
                    order by week desc
                    limit    1) m
group by   year, month, category, max_val_of_month;

但是上面的大多数解决方案实际上都可以使用这个索引,而不仅仅是最后一个。

没有索引:http://rextester.com/WNEL86809
使用索引:http://rextester.com/TYUA52054