拥有以下数据
+-----------------+-------------+----------+-----------------+-------------+----------+
| firstgroupId | firstCount | firstId | secondclusterId | secondCount | secondId |
+-----------------+-------------+----------+-----------------+-------------+----------+
| 100001 | 3 | 3000001 | 100003 | 4 | 3000001 |
| 100001 | 3 | 3000002 | 100003 | 4 | 3000002 |
| 100001 | 3 | 3000003 | 100003 | 4 | 3000003 |
| 100002 | 2 | 3000004 | 100003 | 4 | 3000004 |
| 100002 | 2 | 3000005 | 100002 | 4 | 3000005 |
| 100003 | 3 | 3000006 | 100002 | 4 | 3000006 |
| 100003 | 3 | 3000007 | 100002 | 4 | 3000007 |
| 100003 | 3 | 3000008 | 100002 | 4 | 3000008 |
| 100004 | 2 | 3000009 | 100005 | 2 | 3000009 |
| 100004 | 2 | 3000010 | 100005 | 2 | 3000010 |
+-----------------+-------------+----------+-----------------+-------------+----------+
这里我们可以看到
需要通过比较两组Id来找出奇怪的人吗?
答案 0 :(得分:0)
您似乎需要ID,其中组和群集之间的重叠不是“最大”重叠。如果我这样,我认为这样做你想要的:
select t.*
from (select t.*,
row_number() over (partition by firstgroupId order by overlap_count desc) as overlap_rank
from (select t.*,
count(*) over (partition by firstgroupId, secondclusterId) as overlap_count
from t
) t
) t
where overlap_rank > 1;
如果您知道要查找单例异常值,则只能使用最里面的子查询并使用where overlap_count = 1
。