填充数组交叉表中的缺失数据

时间:2018-08-21 12:01:41

标签: sql arrays postgresql missing-data crosstab

如何以不丢失数据的方式将数据聚合到数组中 例如,我查询了从1月到12月的所有注册用户的年度报告:

    with s_table as (

    SELECT
     city,
     gs.mounth as month,
     coalesce(count(city),0) as count
    FROM
     generate_series('2017-01-01'::date, '2017-12-31'::date , interval '1 month') as gs(mounth)
    LEFT JOIN "user"
        ON to_char("user".datereg, 'YYYY-MM') = to_char(gs.mounth::date, 'YYYY-MM')
    GROUP BY city, gs.mounth
)
    select city,
    array_agg(count) as count
    from s_table
    group by s_table.city
    order by s_table.city;

它返回丢失的数据:

|City  |arr_agg|
|Dublin|{1}|               //ONLY DECEMBER IS FILLED!
|Berlin|{1,4,5,10}            //ONLY JAN,MAR,APR,OCT ARE FILLED!

期望结果

    |City  |       Count users        |
    |Dublin|{0,0,0,0,0,0,0,0,0,0,0,1} |
    |Berlin|{1,0,4,5,0,0,0,0,0,10,0,0}|

如何用'0'填充丢失的数据?

1 个答案:

答案 0 :(得分:0)

您需要为所有城市生成所有月份。要获取行,请考虑cross join

with s_table as (
      select c.city, gs.month as month,
             count(u.city) as count
      from generate_series('2017-01-01'::date, '2017-12-31'::date , interval '1 month') as gs(month) cross join
           (select distinct u.city from user u) c left join
           "user" u
           on date_trunc('month', u.datereg,) = date_trunc(gs.month::date) and
              u.city = c.city
        group by c.city, gs.month
       )
select city, array_agg(count order by month) as count
from s_table
group by s_table.city
order by s_table.city;

请注意答案中的其他更改:

  • 日期比较使用日期函数而不是字符串。
  • array_agg()有一个order by
  • 无需将generate_series()的结果转换为日期。
  • count()不返回NULL,因此coalese()是不必要的。