我在postgresql中有两个表与多对多关联。第一个表包含活动,可能有零个或多个原因:
CREATE TABLE activity (
id integer NOT NULL,
-- other fields removed for readability
);
CREATE TABLE reason (
id varchar(1) NOT NULL,
-- other fields here
);
为了执行关联,这两个表之间存在的连接表:
CREATE TABLE activity_reason (
activity_id integer NOT NULL, -- refers to activity.id
reason_id varchar(1) NOT NULL, -- refers to reason.id
CONSTRAINT activity_reason_activity FOREIGN KEY (activity_id) REFERENCES activity (id),
CONSTRAINT activity_reason_reason FOREIGN KEY (reason_id) REFERENCES reason (id)
);
我想计算活动和原因之间可能存在的关联。假设我在表activity_reason
中有这些记录:
+--------------+------------+
| activity_id | reason_id |
+--------------+------------+
| 1 | A |
| 1 | B |
| 2 | A |
| 2 | B |
| 3 | A |
| 4 | C |
| 4 | D |
| 4 | E |
+--------------+------------+
我应该有类似的东西:
+-------+---+------+-------+
| count | | | |
+-------+---+------+-------+
| 2 | A | B | NULL |
| 1 | A | NULL | NULL |
| 1 | C | D | E |
+-------+---+------+-------+
或者,最终,像:
+-------+-------+
| count | |
+-------+-------+
| 2 | A,B |
| 1 | A |
| 1 | C,D,E |
+-------+-------+
我找不到SQL查询来执行此操作。
答案 0 :(得分:2)
我认为您可以使用此查询获得所需内容:
SELECT count(*) as count, reasons
FROM (
SELECT activity_id, array_agg(reason_id) AS reasons
FROM (
SELECT A.activity_id, AR.reason_id
FROM activity A
LEFT JOIN activity_reason AR ON AR.activity_id = A.activity_id
ORDER BY activity_id, reason_id
) AS ordered_reasons
GROUP BY activity_id
) reason_arrays
GROUP BY reasons
首先,您将活动的所有原因汇总到每个活动的数组中。您必须首先订购关联,否则['a','b']和['b','a']将被视为不同的集合,并且将具有单独的计数。您还需要包含联接或任何没有任何原因的活动不会显示在结果集中。我不确定这是否可取,如果你想要没有理由不包含在内的活动,我可以把它拿回去。然后计算具有相同原因的活动数量。
以下是sqlfiddle来演示
正如Gordon Linoff所提到的,您也可以使用字符串而不是数组。我不确定哪种性能更好。
答案 1 :(得分:1)
我们需要比较排序的原因列表来识别相同的集合。
SELECT count(*) AS ct, reason_list
FROM (
SELECT array_agg(reason_id) AS reason_list
FROM (SELECT * FROM activity_reason ORDER BY activity_id, reason_id) ar1
GROUP BY activity_id
) ar2
GROUP BY reason_list
ORDER BY ct DESC, reason_list;
最里面的子查询中的 ORDER BY reason_id
也可以,但添加activity_id
通常会更快。
我们根本不需要最里面的子查询。这也有效:
SELECT count(*) AS ct, reason_list
FROM (
SELECT array_agg(reason_id ORDER BY reason_id) AS reason_list
FROM activity_reason
GROUP BY activity_id
) ar2
GROUP BY reason_list
ORDER BY ct DESC, reason_list;
但处理全部或大部分表格通常较慢。 Quoting the manual:
或者,从排序的子查询中提供输入值通常可以正常工作。
我们可以使用string_agg()
而不是array_agg()
,这对于varchar(1)
的示例有效(对于数据类型{{可能更有效) 1}},顺便说一句。但是,对于更长的字符串,它可能会失败聚合值可能不明确。
如果"char"
是 reason_id
(通常是这样),那么来自附加模块{{3}的另一个更快的解决方案integer
}}:
sort()
相关,有更多解释:
答案 2 :(得分:0)
您可以使用string_agg()
:
select reasons, count(*)
from (select activity_id, string_agg(reason_id, ',' order by reason_id) as reasons
from activity_reason
group by activity_id
) a
group by reasons
order by count(*) desc;