Question

因此，我尝试将概述统计信息计算为JSON，但是在将它们争论到查询中时遇到了问题。

有2个表：

appointments
- time timestamp
- patients int


assignments
- user_id int
- appointment_id int

我想按用户计算当天的小时数。理想情况下，它看起来像这样：

[ 
  {hour: "2015-07-01T08:00:00.000Z", assignments: [
    {user_id: 123, patients: 3}, 
    {user_id: 456, patients: 10}, 
    {user_id: 789, patients: 4},
  ]},
  {hour: "2015-07-01T09:00:00.000Z", assignments: [
    {user_id: 456, patients: 1},
    {user_id: 789, patients: 6}
  ]},
  {hour: "2015-07-01T10:00:00.000Z", assignments: []}
  ...
]

我有点亲近：

with assignments_totals as (
    select user_id,sum(patients),date_trunc('hour',appointments.time) as hour
    from assignments
    inner join appointments on appointments.id = assignments.appointment_id
    group by date_trunc('hour',sales.time),user_id
  ), hours as (
    select to_char(date_trunc('hour',time),'YYYY-MM-DD"T"HH24:00:00.000Z') as hour, array_to_json(array_agg(DISTINCT assignment_totals)) as patients
    from appointments 
    left join assignment_totals on date_trunc('hour',sales.time) = assignment_totals.hour
    where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z' 
    group by date_trunc('hour',time)
    order by date_trunc('hour',time) 
  )
  select array_to_json(array_agg(hours)) as hours from hours;

哪个输出：

[ 
  {hour: "2015-07-01T08:00:00.000Z", assignments: [
    {user_id: 123, patients: 3, hour: "2015-07-01T08:00:00.000Z" }, 
    {user_id: 456, patients: 10, hour: "2015-07-01T08:00:00.000Z"}, 
    {user_id: 789, patients: 4, hour: "2015-07-01T08:00:00.000Z"},
  ]},
  {hour: "2015-07-01T09:00:00.000Z", assignments: [
    {user_id: 456, patients: 1, hour: "2015-07-01T09:00:00.000Z"},
    {user_id: 789, patients: 6, hour: "2015-07-01T09:00:00.000Z"}
  ]},
  {hour: "2015-07-01T10:00:00.000Z", assignments: [null]}
  ...
]

虽然这有效，但有两个问题，可能相互独立，也可能不相互独立：

如果那个小时没有约会，我仍然希望小时包含在数组中（例如示例中的10AM），但是要有一个空的＆＃34;赋值＆＃34;阵列。现在它在那里放了一个零，我无法弄清楚如何摆脱它同时仍然保持在那里的时间。
我必须在分配条目中包含小时以及user_id和约会，因为我需要它将assignments_totals查询加入小时查询。但这是不必要的，因为它已经在父母身上了。
我觉得它应该可以在1 cte和1个查询中完成，现在我使用2 cte's ...但是无法弄清楚如何压缩它并制作它工作。

我想做类似

  hours as (
    select to_char(date_trunc('hour',time),'YYYY-MM-DD"T"HH24:00:00.000Z') as hour, sum(appointments.patients) OVER(partition by assignments.user_id) as appointments
    from appointments 
    left join assignments on appointments.id = assignments.appointment_id
    where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z'  
    group by date_trunc('hour',time)
    order by date_trunc('hour',time) 
  )
  select array_to_json(array_agg(hours)) as hours from hours

但是如果不给我一个＆＃34;属性必须在group by或者聚合函数错误中，我就无法工作。

任何人都知道如何解决这些问题？提前谢谢！

Answer 1

您上次查询的主要问题似乎是将window functions与aggregate functions混为一谈。窗口函数使用OVER语法，当GROUP BY子句中有其他字段时，它们本身不需要SELECT。另一方面，当GROUP BY子句中存在其他（非聚合函数）字段时，聚合函数使用SELECT。这种差异的一个实际结果是窗口函数不会自动DISTINCT。

窗口函数产生的NULL值的问题可以通过简单的COALESCE来解决，这样就可以使用零而不是null。

因此，要使用窗口函数编写查询，请使用以下内容：

WITH hours AS
(
    SELECT DISTINCT to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z') AS hour,
           COALESCE(SUM(ap.patients) OVER (PARTITION BY asgn.user_id), 0) AS appointment_count
    FROM   appointments ap
    LEFT JOIN assignments asgn ON ap.id = asgn.appointment_id
    WHERE  ap.time >= '2015-07-01T07:00:00.000Z'
    AND    ap.time < '2015-07-02T07:00:00.000Z'
)
SELECT array_to_json(array_agg(hours)) AS hours
FROM   hours
ORDER BY hour

使用聚合函数：

WITH hours AS
(
    SELECT to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z') AS hour,
           SUM(COALESCE(ap.patients, 0)) AS appointment_count,
           asgn.user_id
    FROM   appointments ap
    LEFT JOIN assignments asgn ON ap.id = asgn.appointment_id
    WHERE  ap.time >= '2015-07-01T07:00:00.000Z'
    AND    ap.time < '2015-07-02T07:00:00.000Z'
    GROUP BY asgn.user_id, to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z')
)
SELECT array_to_json(array_agg(hours)) AS hours
FROM   hours
ORDER BY hour

我的语法可能不太正确，所以在使用此解决方案之前要仔细检查一下（并随意编辑以纠正任何错误）。

Answer 2

我对此感到非常沮丧，因为我没有看过Postgres 9.4文档，该文档具有处理json的新功能。

我找到的解决方案建立在原始查询的基础上，但随后使用json_array_elements打破了赋值数组，使用where过滤器，然后再次重新构建它。基本上似乎毫无意义：

json_agg(json_array_elements(json_agg(*)))

但它的性能差异很小，让我得到了我需要去的地方。如果您找到更好的解决方案，请随时发表评论！它也应该在＆lt; 9.4中使用array_agg和unexst但我遇到了麻烦，因为我试图取消从CTE返回的记录类型，而不是具有列定义的实际行类型。

  with assignment_totals as (
    select 
      date_trunc('hour',appointments.time) as hour, 
      user_id, 
      coalesce(sum(patients),0) as patients
    from appointments
    left outer join assignments on appointment.id = assignments.appointment_id
    where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z' 
    group by date_trunc('hour',appointments.time),user_id
  ), hours as (
    select 
      to_char(assignment_totals.hour,'YYYY-MM-DD"T"HH24:00:00.000Z') as hour,
      (
        select coalesce(json_agg(json_build_object('user_id',(t->'user_id'),'patients',(t->'patients')) order by (t->>'user_id')),'[]'::json) 
        from json_array_elements(json_agg(assignment_totals)) t 
        where (t->>'patients') != '0'
      ) as patients
    from assignment_totals 
    group by assignment_totals.hour
    order by assignment_totals.hour
  )
  select array_to_json(array_agg(hours)) as hours from hours

感谢Andrew指出我可以将空值合并为0.但是我仍然希望过滤掉患者= 0的条目。这样可以解决我所有的问题，让我能够用一个地方过滤它们，然后让我能够通过使用json_build_object构建一个新的json对象来节省时间。

在Postgresql中将统计信息聚合到JSON中

2 个答案: