在Postgresql中将统计信息聚合到JSON中

时间:2015-06-16 22:56:27

标签: json postgresql aggregate-functions common-table-expression

因此,我尝试将概述统计信息计算为JSON,但是在将它们争论到查询中时遇到了问题。

有2个表:

appointments
- time timestamp
- patients int


assignments
- user_id int
- appointment_id int

我想按用户计算当天的小时数。理想情况下,它看起来像这样:

[ 
  {hour: "2015-07-01T08:00:00.000Z", assignments: [
    {user_id: 123, patients: 3}, 
    {user_id: 456, patients: 10}, 
    {user_id: 789, patients: 4},
  ]},
  {hour: "2015-07-01T09:00:00.000Z", assignments: [
    {user_id: 456, patients: 1},
    {user_id: 789, patients: 6}
  ]},
  {hour: "2015-07-01T10:00:00.000Z", assignments: []}
  ...
]

我有点亲近:

with assignments_totals as (
    select user_id,sum(patients),date_trunc('hour',appointments.time) as hour
    from assignments
    inner join appointments on appointments.id = assignments.appointment_id
    group by date_trunc('hour',sales.time),user_id
  ), hours as (
    select to_char(date_trunc('hour',time),'YYYY-MM-DD"T"HH24:00:00.000Z') as hour, array_to_json(array_agg(DISTINCT assignment_totals)) as patients
    from appointments 
    left join assignment_totals on date_trunc('hour',sales.time) = assignment_totals.hour
    where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z' 
    group by date_trunc('hour',time)
    order by date_trunc('hour',time) 
  )
  select array_to_json(array_agg(hours)) as hours from hours;

哪个输出:

[ 
  {hour: "2015-07-01T08:00:00.000Z", assignments: [
    {user_id: 123, patients: 3, hour: "2015-07-01T08:00:00.000Z" }, 
    {user_id: 456, patients: 10, hour: "2015-07-01T08:00:00.000Z"}, 
    {user_id: 789, patients: 4, hour: "2015-07-01T08:00:00.000Z"},
  ]},
  {hour: "2015-07-01T09:00:00.000Z", assignments: [
    {user_id: 456, patients: 1, hour: "2015-07-01T09:00:00.000Z"},
    {user_id: 789, patients: 6, hour: "2015-07-01T09:00:00.000Z"}
  ]},
  {hour: "2015-07-01T10:00:00.000Z", assignments: [null]}
  ...
]

虽然这有效,但有两个问题,可能相互独立,也可能不相互独立:

  1. 如果那个小时没有约会,我仍然希望小时包含在数组中(例如示例中的10AM),但是要有一个空的&#34;赋值&#34;阵列。现在它在那里放了一个零,我无法弄清楚如何摆脱它同时仍然保持在那里的时间。
  2. 我必须在分配条目中包含小时以及user_id和约会,因为我需要它将assignments_totals查询加入小时查询。但这是不必要的,因为它已经在父母身上了。
  3. 我觉得它应该可以在1 cte和1个查询中完成,现在我使用2 cte's ...但是无法弄清楚如何压缩它并制作它工作。
  4. 我想做类似

    的事情
      hours as (
        select to_char(date_trunc('hour',time),'YYYY-MM-DD"T"HH24:00:00.000Z') as hour, sum(appointments.patients) OVER(partition by assignments.user_id) as appointments
        from appointments 
        left join assignments on appointments.id = assignments.appointment_id
        where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z'  
        group by date_trunc('hour',time)
        order by date_trunc('hour',time) 
      )
      select array_to_json(array_agg(hours)) as hours from hours
    

    但是如果不给我一个&#34;属性必须在group by或者聚合函数错误中,我就无法工作。

    任何人都知道如何解决这些问题?提前谢谢!

2 个答案:

答案 0 :(得分:0)

您上次查询的主要问题似乎是将window functionsaggregate functions混为一谈。窗口函数使用OVER语法,当GROUP BY子句中有其他字段时,它们本身不需要SELECT。另一方面,当GROUP BY子句中存在其他(非聚合函数)字段时,聚合函数使用SELECT。这种差异的一个实际结果是窗口函数不会自动DISTINCT

窗口函数产生的NULL值的问题可以通过简单的COALESCE来解决,这样就可以使用零而不是null。

因此,要使用窗口函数编写查询,请使用以下内容:

WITH hours AS
(
    SELECT DISTINCT to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z') AS hour,
           COALESCE(SUM(ap.patients) OVER (PARTITION BY asgn.user_id), 0) AS appointment_count
    FROM   appointments ap
    LEFT JOIN assignments asgn ON ap.id = asgn.appointment_id
    WHERE  ap.time >= '2015-07-01T07:00:00.000Z'
    AND    ap.time < '2015-07-02T07:00:00.000Z'
)
SELECT array_to_json(array_agg(hours)) AS hours
FROM   hours
ORDER BY hour

使用聚合函数:

WITH hours AS
(
    SELECT to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z') AS hour,
           SUM(COALESCE(ap.patients, 0)) AS appointment_count,
           asgn.user_id
    FROM   appointments ap
    LEFT JOIN assignments asgn ON ap.id = asgn.appointment_id
    WHERE  ap.time >= '2015-07-01T07:00:00.000Z'
    AND    ap.time < '2015-07-02T07:00:00.000Z'
    GROUP BY asgn.user_id, to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z')
)
SELECT array_to_json(array_agg(hours)) AS hours
FROM   hours
ORDER BY hour

我的语法可能不太正确,所以在使用此解决方案之前要仔细检查一下(并随意编辑以纠正任何错误)。

答案 1 :(得分:0)

我对此感到非常沮丧,因为我没有看过Postgres 9.4文档,该文档具有处理json的新功能。

我找到的解决方案建立在原始查询的基础上,但随后使用json_array_elements打破了赋值数组,使用where过滤器,然后再次重新构建它。基本上似乎毫无意义:

json_agg(json_array_elements(json_agg(*)))

但它的性能差异很小,让我得到了我需要去的地方。如果您找到更好的解决方案,请随时发表评论!它也应该在&lt; 9.4中使用array_agg和unexst但我遇到了麻烦,因为我试图取消从CTE返回的记录类型,而不是具有列定义的实际行类型。

  with assignment_totals as (
    select 
      date_trunc('hour',appointments.time) as hour, 
      user_id, 
      coalesce(sum(patients),0) as patients
    from appointments
    left outer join assignments on appointment.id = assignments.appointment_id
    where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z' 
    group by date_trunc('hour',appointments.time),user_id
  ), hours as (
    select 
      to_char(assignment_totals.hour,'YYYY-MM-DD"T"HH24:00:00.000Z') as hour,
      (
        select coalesce(json_agg(json_build_object('user_id',(t->'user_id'),'patients',(t->'patients')) order by (t->>'user_id')),'[]'::json) 
        from json_array_elements(json_agg(assignment_totals)) t 
        where (t->>'patients') != '0'
      ) as patients
    from assignment_totals 
    group by assignment_totals.hour
    order by assignment_totals.hour
  )
  select array_to_json(array_agg(hours)) as hours from hours

感谢Andrew指出我可以将空值合并为0.但是我仍然希望过滤掉患者= 0的条目。这样可以解决我所有的问题,让我能够用一个地方过滤它们,然后让我能够通过使用json_build_object构建一个新的json对象来节省时间。