如何识别一组相似组中存在的子组?

时间:2017-01-19 22:12:09

标签: sql database data-manipulation

我有一个中等大小的权限列表,以及分配给这些权限的用户。我想在共享相同权限时将用户组合成角色,但我遇到了一些问题。

在电子表格中操作数据,我能够计算每组唯一权限,并根据用户的整个权限将用户组合成一个角色。结果是每个用户只有一个角色。

我希望能够做的是识别数据集中的子组,这样我就可以减少角色数量,同时增加每个用户的角色分配数量。

这是一个示例数据集: enter image description here

查看数据很容易找到潜在的角色(用户1和2都共享前6个权限),但有没有办法通过SQL,电子表格函数或简单的程序来取消这类数据?< / p>

我发现这个问题有多个答案,基于每个角色的最小权限数,或分配给角色的最小用户数等。

我不期望找到最终答案,但如果有任何意义的话,试着向前推进一个算法步骤。

2 个答案:

答案 0 :(得分:1)

好的,让我们制作一些数据!

DECLARE @User TABLE
(
    Perm INT,
    User1 INT,
    User2 INT,
    User3 INT,
    User4 INT,
    User5 INT,
    User6 INT,
    User7 INT,
    User8 INT,
    User9 INT,
    User10 INT
)

INSERT INTO @User
( Perm, User1, User2, User3, User4, User5, User6, User7, User8, User9, User10 )
VALUES
( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 2, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1 ),
( 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 4, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0 ),
( 5, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1 ),
( 6, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1 ),
( 7, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1 ),
( 8, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 9, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1 );

现在我们在表中拥有权限和用户,现在我们进行一些计数并创建分组值。

SELECT
    u.Perm,
    u.User1, 
    u.User2, 
    u.User3, 
    u.User4, 
    u.User5, 
    u.User6, 
    u.User7, 
    u.User8, 
    u.User9, 
    u.User10,
    CASE WHEN u.User1 = 1 THEN 1 ELSE 0 END +
    CASE WHEN u.User2 = 1 THEN 2 ELSE 0 END +
    CASE WHEN u.User3 = 1 THEN 4 ELSE 0 END +
    CASE WHEN u.User4 = 1 THEN 8 ELSE 0 END +
    CASE WHEN u.User5 = 1 THEN 16 ELSE 0 END +
    CASE WHEN u.User6 = 1 THEN 32 ELSE 0 END +
    CASE WHEN u.User7 = 1 THEN 64 ELSE 0 END +
    CASE WHEN u.User8 = 1 THEN 128 ELSE 0 END +
    CASE WHEN u.User9 = 1 THEN 256 ELSE 0 END +
    CASE WHEN u.User10 = 1 THEN 512 ELSE 0 END AS GroupMe
FROM @User u

这是输出:

Perm    User1   User2   User3   User4   User5   User6   User7   User8   User9   User10  GroupMe
1   1   1   1   1   1   1   1   1   1   1   1023
2   1   1   0   0   0   0   0   1   1   1   899
3   1   0   0   0   0   0   0   0   0   0   1
4   1   1   1   1   0   0   0   0   0   0   15
5   1   1   0   0   0   0   0   1   1   1   899
6   1   1   0   0   0   0   0   0   1   1   771
7   0   0   1   1   1   1   1   0   1   1   892
8   1   0   0   0   0   0   0   0   0   0   1
9   1   0   1   1   0   1   1   0   1   1   877

您会看到3和8具有相同的值。 2和5也有相同的值。

好的,我们使用数字表添加烫发分组区域:

;WITH
a AS (SELECT 1 AS i UNION ALL SELECT 1),
b AS (SELECT 1 AS i FROM a AS x, a AS y),
c AS (SELECT 1 AS i FROM b AS x, b AS y),
d AS (SELECT 1 AS i FROM c AS x, c AS y),
e AS (SELECT 1 AS i FROM d AS x, d AS y),
f AS (SELECT 1 AS i FROM e AS x, e AS y),
numbers AS 
(
    SELECT TOP(10)
        ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS number
    FROM f
), PrivBreakout AS
(
    SELECT 1 AS UserId, u.Perm 
    FROM @User u
    WHERE u.User1 = 1
    UNION
    SELECT 2 AS UserId, u.Perm 
    FROM @User u
    WHERE u.User2 = 1
    UNION
    SELECT 3 AS UserId, u.Perm 
    FROM @User u
    WHERE u.User3 = 3
    UNION
    SELECT 4 AS UserId, u.Perm 
    FROM @User u
    WHERE u.User4 = 1
    UNION
    SELECT 5 AS UserId, u.Perm 
    FROM @User u
    WHERE u.User5 = 1
    UNION
    SELECT 6 AS UserId, u.Perm 
    FROM @User u
    WHERE u.User6 = 1
    UNION
    SELECT 7 AS UserId, u.Perm 
    FROM @User u
    WHERE u.User7 = 1
    UNION
    SELECT 8 AS UserId, u.Perm 
    FROM @User u
    WHERE u.User8 = 1
    UNION
    SELECT 9 AS UserId, u.Perm 
    FROM @User u
    WHERE u.User9 = 1
    UNION
    SELECT 10 AS UserId, u.Perm 
    FROM @User u
    WHERE u.User10 = 1
), ThreeLayerCombo AS
(
    SELECT 
        a.number AS priva,
        b.number AS privb,
        c.number AS privc
    FROM numbers a
    CROSS JOIN numbers b
    CROSS JOIN numbers c
    WHERE b.number > a.number
        AND c.number > b.number
)

现在在上面的代码中,我决定寻找至少3个权限的组合

SELECT t.priva, t.privb, t.privc, COUNT(DISTINCT a.UserId) AS Grouper
FROM ThreeLayerCombo t
INNER JOIN PrivBreakout a
    ON t.priva = a.Perm
INNER JOIN PrivBreakout b
    ON b.UserId = a.UserId
    AND t.privb = b.Perm
INNER JOIN PrivBreakout c
    ON c.UserId = a.UserId
    AND t.privc = c.Perm
GROUP BY t.priva, t.privb, t.privc
ORDER BY COUNT(DISTINCT a.UserId) DESC

让我们寻找最好的组合,这是输出:

priva   privb   privc   Grouper
1   2   5   5
1   7   9   5
2   5   6   4
1   2   6   4
1   5   6   4
1   2   9   3
2   5   9   3
1   5   9   3
1   6   9   3
2   6   9   3
5   6   9   3
5   7   9   2
5   6   7   2
4   5   6   2
2   7   9   2
6   7   9   2
1   4   9   2
1   6   7   2
2   6   7   2
2   5   7   2
2   4   5   2
2   4   6   2
1   2   7   2
1   5   7   2
1   2   4   2
1   4   5   2
1   4   6   2
1   4   7   1
1   4   8   1
1   2   3   1
1   5   8   1
1   2   8   1
1   3   4   1
1   3   5   1
1   3   6   1
1   3   8   1
1   3   9   1
2   4   8   1
2   4   9   1
2   5   8   1
2   6   8   1
1   6   8   1
1   8   9   1
2   3   4   1
2   3   5   1
2   3   6   1
2   3   8   1
2   3   9   1
6   8   9   1
2   8   9   1
3   4   5   1
3   4   6   1
3   4   8   1
3   4   9   1
3   5   6   1
3   5   8   1
3   5   9   1
3   6   8   1
3   6   9   1
3   8   9   1
4   5   8   1
4   5   9   1
4   6   8   1
4   6   9   1
4   7   9   1
4   8   9   1
5   6   8   1
5   8   9   1

从输出中,最好的投注是(1,2,5)和(1,7,9)来​​构建特定角色。

希望这有帮助!

答案 1 :(得分:1)

而不是聚类(二进制数据真的很糟糕),使用:

  • 链接预测/推荐系统:如果用户A拥有权限b和c,还有哪些其他权限可以提示?
  • 频繁项目集挖掘/关联规则:如果用户有a,b则他还应该拥有权限c a, b -> c