Question

为此而忙。尝试根据重叠的ID和数据关联获取唯一的组集。我将通过示例更清楚地解释一下：

with src as (
    select [hash], id, 
        -- determine uniqueness of the hash by rank()
        rank() over (partition by [hash] order by id) rnk
    from ( 
        -- mocked data
        values
        ('0x00', '1000'),
        ('0x0A', '1001'), 
        ('0x0A', '1002'),
        ('0x0B', '1001'), 
        ('0x0B', '1002'),
        ('0x0B', '1003'),
        ('0x0C', '3001'),
        ('0x0C', '3002'),
        ('0x0C', '3003'),
        ('0x0D', '3001'),
        ('0x0D', '3002'),
        ('0x0D', '3003')
    ) as t([hash], id)
),
filter as (
    -- filters out any id's with no overlapping hashes
    select distinct [hash], id
    from src s
    where exists (
        select 1 from src t
        where s.[hash] = t.[hash]
        and t.rnk > 1
    )
)

然后，我需要确定所有按哈希分组的ID。因此，我假设一组数据类似于以下内容：

0x0A: 1001, 1002
0x0B: 1001, 1002, 1003
0x0C: 3001, 3002, 3003
0x0D: 3001, 3002, 3003

然后从上述设置中，通过关联确定唯一组合。所谓关联，是因为集合1001、1002是集合1001、1002和1003的一部分，所以我想将它们组合成一个唯一的集合1001、1002、1003。此时将散列丢弃。

寻找这样的最终输出：

groupsetid  id
1           1001
1           1002
1           1003
2           3001
2           3002
2           3003

或者如果更容易的话：

groupsetid  ids
1           1001, 1002, 1003
2           3001, 3002, 3003

这最终是一个报告，告诉我们对于给定x数量的重叠哈希，id相互冲突。非常感谢任何想要解决这个问题的人，我真是疯了，试图找出答案！

编辑：这只是一个测试用例，这些不是我要处理的实际值，只是数据的表示形式和看到的某些组合。因此，我不能使用类似运算符等的逻辑来过滤值本身。

Answer 1

select id 
,GroupIds = case when id like '1%' then '1' when id like '3%' then '3' else 'N/A' end
from src
group by 
id

这样做是为您做的。您将在src CTE之后添加以上行。

获取具有重叠值的不同ID组

1 个答案: