场景

Question

场景

假设我们有一组表示四个关键概念的数据库表：

实体类型（例如帐户，客户等）
实体（例如上述实体类型的实例）
同类群组（指定群组）
群组成员（构成群组成员的实体）

群组的规则是：

一个群组总是至少有一个群组成员。
同类群组成员必须是该群组的唯一成员（即实体5不能成为群组3的成员两次，但它可能是群组3和群组4的成员）
虽然一个群组可能合法地成为另一个群组的子集，但没有两个群组在成员资格方面完全相同。

实体的规则是：

没有两个实体可能具有相同的值对(business_key, entity_type_id)
具有不同entity_type_id的两个实体可能共享business_key

因为图片代表了一千行代码，所以这里是ERD：

问题

我想要一个SQL查询，当提供(business_key, entity_type_id)对的集合时，将搜索与完全匹配的同类群组，如果只有cohort_id则返回一行群组存在，否则为零行。

即。 - 如果实体集与entity_ids 1和2匹配，则只会返回cohort_id cohort_members正好为1和2，而不只是1，而不仅仅是2，而不是同类群组使用entity_ids 1 2和3.如果不存在满足此要求的同类群组，则返回零行。

测试用例

为了帮助人们解决这个问题，我创建了一个表格的小提琴以及一些定义各种实体类型，实体和同类群组的数据。还有一个表格，其中包含用于匹配的测试数据，名为test_cohort。它包含6个测试队列，用于测试各种场景。前5个测试应该完全匹配一个队列。第6次测试是一个测试零行条款的虚假测试。使用测试表时，关联的INSERT语句应该只有一行未注释（请参阅小提琴，它最初设置如下）：

http://sqlfiddle.com/#!18/2d022

我在SQL中的尝试如下，虽然它未通过测试＃2和＃4（可以在小提琴中找到）：

SELECT actual_cohort_member.cohort_id
FROM test_cohort
INNER JOIN entity
    ON entity.business_key = test_cohort.business_key
    AND entity.entity_type_id = test_cohort.entity_type_id
INNER JOIN cohort_member AS existing_potential_member
    ON existing_potential_member.entity_id = entity.entity_id
INNER JOIN cohort
    ON cohort.cohort_id = existing_potential_member.cohort_id
RIGHT OUTER JOIN cohort_member AS actual_cohort_member
    ON actual_cohort_member.cohort_id = cohort.cohort_id
    AND actual_cohort_member.cohort_id = existing_potential_member.cohort_id
    AND actual_cohort_member.entity_id = existing_potential_member.entity_id
GROUP BY actual_cohort_member.cohort_id
HAVING
    SUM(CASE WHEN
        actual_cohort_member.cohort_id = existing_potential_member.cohort_id AND
        actual_cohort_member.entity_id = existing_potential_member.entity_id THEN 1 ELSE 0
    END) = COUNT(*)
;

Answer 1

这种情况可以通过在WHERE子句中添加复合条件来实现，因为您要与一对值进行比较。然后，您必须根据WHERE子句中设置的条件以及cohort_id的总行数来计算结果。

SELECT  c.cohort_id
FROM    cohort c
        INNER JOIN cohort_member cm
            ON c.cohort_id = cm.cohort_id
        INNER JOIN entity e
            ON cm.entity_id = e.entity_id
WHERE   (e.entity_type_id = 1 AND e.business_key = 'acc1')      -- condition here
         OR (e.entity_type_id = 1 AND e.business_key = 'acc2')
GROUP   BY c.cohort_id
HAVING  COUNT(*) = 2                                            -- number must be the same to the total number of condition
        AND (SELECT COUNT(*) 
             FROM cohort_member cm2 
             WHERE cm2.cohort_id = c.cohort_id) = 2             -- number must be the same to the total number of condition

正如您在上面的测试用例中所看到的，过滤器中的值取决于WHERE子句中的条件数。建议在此创建动态查询。

<强>更新

如果表test_cohort只包含一个场景，那么这将满足您的要求，但是，如果test_cohort包含场景列表，那么您可能希望查看其他答案，因为此解决方案不改变任何表模式。

SELECT  c.cohort_id
FROM    cohort c
        INNER JOIN cohort_member cm
            ON c.cohort_id = cm.cohort_id
        INNER JOIN entity e
            ON cm.entity_id = e.entity_id
        INNER JOIN test_cohort tc
            ON tc.business_key = e.business_key
                AND tc.entity_type_id = e.entity_type_id
GROUP   BY c.cohort_id
HAVING  COUNT(*) = (SELECT COUNT(*) FROM test_cohort)
        AND (SELECT COUNT(*) 
             FROM cohort_member cm2 
             WHERE cm2.cohort_id = c.cohort_id) = (SELECT COUNT(*) FROM test_cohort)

Answer 2

我在i表中添加了一列test_cohort，以便您可以同时测试所有方案。这是一个DDL

CREATE TABLE test_cohort (
i int,
business_key NVARCHAR(255),
entity_type_id INT
);

INSERT INTO test_cohort VALUES
(1, 'acc1', 1), (1, 'acc2', 1) -- TEST #1: should match against cohort 1
,(2, 'cli1', 2), (2, 'cli2', 2) -- TEST #2: should match against cohort 2
,(3, 'cli1', 2) -- TEST #3: should match against cohort 3
,(4, 'acc1', 1), (4, 'acc2', 1), (4, 'cli1', 2), (4, 'cli2', 2) -- TEST #4: should match against cohort 4
,(5, 'acc1', 1), (5, 'cli2', 2) -- TEST #5: should match against cohort 5
,(6, 'acc1', 3), (6, 'cli2', 3) -- TEST #6: should not match any cohort

查询：

select
    c.i, m.cohort_id
from
    (
        select 
            *, cnt = count(*) over (partition by i)
        from 
            test_cohort
    ) c
    join entity e on c.entity_type_id = e.entity_type_id and c.business_key = e.business_key
    join (
        select
            *, cnt = count(*) over (partition by cohort_id)
        from
            cohort_member
    ) m on e.entity_id = m.entity_id and c.cnt = m.cnt
group by m.cohort_id, c.cnt, c.i
having count(*) = c.cnt

输出

i   cohort_id
------------
1   1
2   2
3   3
4   4
5   5

想法是计算加入前的行数。并按完全匹配进行比较

查找所有子项完全匹配的父级ID

场景

问题

测试用例

2 个答案: