假设我们有一组表示四个关键概念的数据库表:
群组的规则是:
实体的规则是:
(business_key, entity_type_id)
entity_type_id
的两个实体可能共享business_key
因为图片代表了一千行代码,所以这里是ERD:
我想要一个SQL查询,当提供(business_key, entity_type_id)
对的集合时,将搜索与完全匹配的同类群组,如果只有cohort_id则返回一行群组存在,否则为零行。
即。 - 如果实体集与entity_ids
1和2匹配,则只会返回cohort_id
cohort_members
正好为1和2,而不只是1,而不仅仅是2,而不是同类群组使用entity_ids
1 2和3.如果不存在满足此要求的同类群组,则返回零行。
为了帮助人们解决这个问题,我创建了一个表格的小提琴以及一些定义各种实体类型,实体和同类群组的数据。还有一个表格,其中包含用于匹配的测试数据,名为test_cohort
。它包含6个测试队列,用于测试各种场景。前5个测试应该完全匹配一个队列。第6次测试是一个测试零行条款的虚假测试。使用测试表时,关联的INSERT
语句应该只有一行未注释(请参阅小提琴,它最初设置如下):
http://sqlfiddle.com/#!18/2d022
我在SQL中的尝试如下,虽然它未通过测试#2和#4(可以在小提琴中找到):
SELECT actual_cohort_member.cohort_id
FROM test_cohort
INNER JOIN entity
ON entity.business_key = test_cohort.business_key
AND entity.entity_type_id = test_cohort.entity_type_id
INNER JOIN cohort_member AS existing_potential_member
ON existing_potential_member.entity_id = entity.entity_id
INNER JOIN cohort
ON cohort.cohort_id = existing_potential_member.cohort_id
RIGHT OUTER JOIN cohort_member AS actual_cohort_member
ON actual_cohort_member.cohort_id = cohort.cohort_id
AND actual_cohort_member.cohort_id = existing_potential_member.cohort_id
AND actual_cohort_member.entity_id = existing_potential_member.entity_id
GROUP BY actual_cohort_member.cohort_id
HAVING
SUM(CASE WHEN
actual_cohort_member.cohort_id = existing_potential_member.cohort_id AND
actual_cohort_member.entity_id = existing_potential_member.entity_id THEN 1 ELSE 0
END) = COUNT(*)
;
答案 0 :(得分:2)
这种情况可以通过在WHERE
子句中添加复合条件来实现,因为您要与一对值进行比较。然后,您必须根据WHERE
子句中设置的条件以及cohort_id
的总行数来计算结果。
SELECT c.cohort_id
FROM cohort c
INNER JOIN cohort_member cm
ON c.cohort_id = cm.cohort_id
INNER JOIN entity e
ON cm.entity_id = e.entity_id
WHERE (e.entity_type_id = 1 AND e.business_key = 'acc1') -- condition here
OR (e.entity_type_id = 1 AND e.business_key = 'acc2')
GROUP BY c.cohort_id
HAVING COUNT(*) = 2 -- number must be the same to the total number of condition
AND (SELECT COUNT(*)
FROM cohort_member cm2
WHERE cm2.cohort_id = c.cohort_id) = 2 -- number must be the same to the total number of condition
正如您在上面的测试用例中所看到的,过滤器中的值取决于WHERE
子句中的条件数。建议在此创建动态查询。
<强>更新强>
如果表test_cohort
只包含一个场景,那么这将满足您的要求,但是,如果test_cohort
包含场景列表,那么您可能希望查看其他答案,因为此解决方案不改变任何表模式。
SELECT c.cohort_id
FROM cohort c
INNER JOIN cohort_member cm
ON c.cohort_id = cm.cohort_id
INNER JOIN entity e
ON cm.entity_id = e.entity_id
INNER JOIN test_cohort tc
ON tc.business_key = e.business_key
AND tc.entity_type_id = e.entity_type_id
GROUP BY c.cohort_id
HAVING COUNT(*) = (SELECT COUNT(*) FROM test_cohort)
AND (SELECT COUNT(*)
FROM cohort_member cm2
WHERE cm2.cohort_id = c.cohort_id) = (SELECT COUNT(*) FROM test_cohort)
答案 1 :(得分:1)
我在i
表中添加了一列test_cohort
,以便您可以同时测试所有方案。这是一个DDL
CREATE TABLE test_cohort (
i int,
business_key NVARCHAR(255),
entity_type_id INT
);
INSERT INTO test_cohort VALUES
(1, 'acc1', 1), (1, 'acc2', 1) -- TEST #1: should match against cohort 1
,(2, 'cli1', 2), (2, 'cli2', 2) -- TEST #2: should match against cohort 2
,(3, 'cli1', 2) -- TEST #3: should match against cohort 3
,(4, 'acc1', 1), (4, 'acc2', 1), (4, 'cli1', 2), (4, 'cli2', 2) -- TEST #4: should match against cohort 4
,(5, 'acc1', 1), (5, 'cli2', 2) -- TEST #5: should match against cohort 5
,(6, 'acc1', 3), (6, 'cli2', 3) -- TEST #6: should not match any cohort
查询:
select
c.i, m.cohort_id
from
(
select
*, cnt = count(*) over (partition by i)
from
test_cohort
) c
join entity e on c.entity_type_id = e.entity_type_id and c.business_key = e.business_key
join (
select
*, cnt = count(*) over (partition by cohort_id)
from
cohort_member
) m on e.entity_id = m.entity_id and c.cnt = m.cnt
group by m.cohort_id, c.cnt, c.i
having count(*) = c.cnt
输出
i cohort_id
------------
1 1
2 2
3 3
4 4
5 5
想法是计算加入前的行数。并按完全匹配进行比较