在previous question上,我问了一个类似的问题,这个问题依赖于辅助表作为分割数据标准的一部分。看来我目前的目标更容易,但我无法弄清楚。
鉴于表格:
CREATE TABLE conversations (id int, record_id int, is_response bool, text text);
INSERT INTO conversations VALUES
(1, 1, false, 'in text 1')
, (2, 1, true , 'response text 1')
, (3, 1, false, 'in text 2')
, (4, 1, true , 'response text 2')
, (5, 1, true , 'response text 3')
, (6, 2, false, 'in text 1')
, (7, 2, true , 'response text 1')
, (8, 2, false, 'in text 2')
, (9, 2, true , 'response text 2')
, (10, 2, true , 'response text 3');
我想根据is_response
值汇总文本并输出以下内容:
record_id | aggregated_text |
----------+---------------------------------------------------+
1 |in text 1 response text 1 |
----------+---------------------------------------------------+
1 |in text 2 response text 2 response text 3 |
----------+---------------------------------------------------+
2 |in text 1 response text 1 |
----------+---------------------------------------------------+
2 |in text 2 response text 2 response text 3 |
我已尝试过以下查询,但无法连续聚合两个响应,IE:is_response在序列中为真。
SELECT
record_id,
string_agg(text, ' ' ORDER BY id) AS aggregated_text
FROM (
SELECT
*,
coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
FROM (
SELECT *, is_response as incl
FROM conversations
) c
) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);
我的查询输出只为以下is_response行添加了另一行,如下所示:
record_id | aggregated_text |
----------+---------------------------------------------------+
1 |in text 1 response text 1 |
----------+---------------------------------------------------+
1 |in text 2 response text 2 |
----------+---------------------------------------------------+
1 |response text 3 |
----------+---------------------------------------------------+
2 |in text 1 response text 1 |
----------+---------------------------------------------------+
2 |in text 2 response text 2 |
----------+---------------------------------------------------+
2 | response text 3 |
----------+---------------------------------------------------+
我该如何解决?
答案 0 :(得分:1)
这基本上是your previous question的简单版本。
SELECT record_id, string_agg(text, ' ') As context
FROM (
SELECT *, count(NOT is_response OR NULL) OVER (PARTITION BY record_id ORDER BY id) AS grp
FROM conversations
ORDER BY record_id, id
) sub
GROUP BY record_id, grp
ORDER BY record_id, grp;
使用子查询中的单个窗口函数生成完全所需的结果,然后进行聚合。
我对上一个问题的回答中的详细说明和链接:
答案 1 :(得分:1)
以下是我在answer中提供的previous question的变体:
SELECT record_id, string_agg(text, ' ')
FROM (
SELECT *, coalesce(sum(incl::integer) OVER w,0) AS subgrp
FROM (
SELECT *, is_response AND NOT coalesce(lead(is_response) OVER w,false) AS incl
FROM conversations
WINDOW w AS (PARTITION BY record_id ORDER BY id)
) t
WINDOW w AS (PARTITION BY record_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) t1
GROUP BY record_id, subgrp
HAVING bool_or(incl)
ORDER BY min(id);
我们的想法是,对于每一行,我们在lead
窗口函数的帮助下查看同一记录的下一行。如果没有这样的行,或者如果有一行,并且当前is_response
为真时其is_response
为假,那么我们选择该行,汇总所有先前未使用的text
值。
此查询还确保如果最后一次会话不完整(在您的示例数据中没有发生),则会被省略。