我有这些数据库表
我想展示按计数顺序排列的所有标签分组的所有组合
示例数据
Question 1, Answer 1, tag1, tag2, tag3, tag4
Question 2, Answer 2, tag2, tag3, tag4
Question 3, Answer 3, tag3, tag4
Question 4, Answer 4, tag4
Question 5, Answer 5, tag3, tag4, tag5
Question 1, Answer 6, <no tags>
如何使用SQL解决此问题?
我不确定SQL是否可行,但是我认为它需要RECURSIVE
方法。
预期结果:
tag3, tag4 occur 4 times
tag2, tag3, tag4 occur 2 times
tag2, tag3 occur 2 times
我们只会返回分组大于1的结果。永远不会返回单个标签,它必须至少包含2个标签才能计数。
答案 0 :(得分:4)
以@filiprem的答案为基础,并使用答案here中经过稍微修改的函数,您将得到:
--test data
create table questions (id int, text varchar(100));
create table answers (id int, text varchar(100), question_id int);
create table answer_tags (id int, answer_id int, tag_id int);
create table tags (id int, text varchar(100));
insert into questions values (1, 'question1'), (2, 'question2'), (3, 'question3'), (4, 'question4'), (5, 'question5');
insert into answers values (1, 'answer1', 1), (2, 'answer2', 2), (3, 'answer3', 3), (4, 'answer4', 4), (5, 'answer5', 5), (6, 'answer6', 1);
insert into tags values (1, 'tag1'), (2, 'tag2'), (3, 'tag3'), (4, 'tag4'), (5, 'tag5');
insert into answer_tags values
(1,1,1), (2,1,2), (3,1,3), (4,1,4),
(5,2,2), (6,2,3), (7,2,4),
(8,3,3), (9,3,4),
(10,4,4),
(11,5,3), (12,5,4), (13,5,5);
--end test data
--function to get all possible combinations from an array with at least 2 elements
create or replace function get_combinations(source anyarray) returns setof anyarray as $$
with recursive combinations(combination, indices) as (
select source[i:i], array[i] from generate_subscripts(source, 1) i
union all
select c.combination || source[j], c.indices || j
from combinations c, generate_subscripts(source, 1) j
where j > all(c.indices) and
array_length(c.combination, 1) <= 2
)
select combination from combinations
where array_length(combination, 1) >= 2
$$ language sql;
--expected results
SELECT tags, count(*) FROM (
SELECT q.id, get_combinations(array_agg(DISTINCT t.text)) AS tags
FROM questions q
JOIN answers a ON a.question_id = q.id
JOIN answer_tags at ON at.answer_id = a.id
JOIN tags t ON t.id = at.tag_id
GROUP BY q.id
) t1
GROUP BY tags
HAVING count(*)>1;
注意:这会使tag2,tag4出现2次,但未达到预期结果(问题1和2)
答案 1 :(得分:2)
您确实可以使用递归CTE产生可能的组合。首先,将所有标签ID选择为一个元素的数组。然后UNION ALL
将CTE和标签ID进行联接,如果标签ID大于数组中的最大ID,则将标签ID附加到数组中。
与CTE一起加入聚合,以数组的形式获取每个答案的标签ID。在ON
子句中,检查答案的数组是否包含CTE中的数组以及该数组包含运算符@>
。
在WHERE
子句中,只有一个标记会排除CTE中的组合,因为您对此不感兴趣。
现在GROUP BY
标记的组合将排除在HAVING
子句中出现少于两次的所有组合-您也对它们不感兴趣。如果您还希望将ID“转换”为SELECT
列表中标签的名称。
WITH RECURSIVE "cte"
AS
(
SELECT ARRAY["t"."id"] "id"
FROM "tags" "t"
UNION ALL
SELECT "c"."id" || "t"."id" "id"
FROM "cte" "c"
INNER JOIN "tags" "t"
ON "t"."id" > (SELECT max("un"."e")
FROM unnest("c"."id") "un" ("e"))
)
SELECT "c"."id" "id",
(SELECT array_agg("t"."text")
FROM unnest("c"."id") "un" ("e")
INNER JOIN "tags" "t"
ON "t"."id" = "un"."e") "text",
count(*) "count"
FROM "cte" "c"
INNER JOIN (SELECT array_agg("at"."tag_id" ORDER BY "at"."tag_id") "id"
FROM "answer_tags" "at"
GROUP BY at.answer_id) "x"
ON "x"."id" @> "c"."id"
WHERE array_length("c"."id", 1) > 1
GROUP BY "c"."id"
HAVING count(*) > 1;
结果:
id | text | count
---------+------------------+-------
{2,3} | {tag2,tag3} | 2
{3,4} | {tag3,tag4} | 4
{2,4} | {tag2,tag4} | 2
{2,3,4} | {tag2,tag3,tag4} | 2
答案 2 :(得分:1)
尝试一下:
SELECT tags, count(*) FROM (
SELECT q.id, array_agg(DISTINCT t.text) AS tags
FROM questions q
JOIN answers a ON a.question_id = q.id
JOIN answer_tags at ON at.answer_id = a.id
JOIN tags t ON t.id = at.tag_id
GROUP BY q.id
) t1
GROUP BY tags
HAVING count(*)>1;