嘿,
我目前正在努力解决Oracle中与自定义群集算法相对应的问题。我有这样一张桌子:
产品
ID Prop1 Prop2 Prop3
------------------------------
1 1001 1002 1003
2 1001 2002 2003
3 3001 1002 3003
4 4001 4002 2003
5 5001 5002 5003
6 6001 6002 6003
7 7001 7002 7003
8 8001 7002 8003
9 9001 1002 4003
每种产品都有不同的属性(Prop1,Prop2,Prop3)。共享共同财产的所有产品应最终位于同一群集中。在这个例子中,我们将有以下集群:
{1,2,3,4,9},{5},{6},{7,8}
产品1和2共享Prop1,产品1,3和9共享Prop2,产品2和4共享Prop3。所有产品的联盟都给我们{1,2,3,4,9}。等等。
到目前为止我做了什么:
select listagg(id, ',') within group (order by prop1) "clusterIdsProp1"
from clustertest
group by prop1
having count(prop1) > 1;
这为我提供了productIds,它构建了一个包含多个项目的集群。
表A
clusterIdsProp1
1,2
表B
clusterIdsProp2
1,3,9
7,8
现在我正在尝试合并这些中间结果以扩展当前群集。如果Oracle共享一个共同的项目,是否有办法合并列表?
我的目标是以我的结果表如下所示的方式合并这两个表:
合并表
clusterIds
1,2,3,9
7,8
我正在使用Oracle 11g。提前致谢。
如果您有其他建议可以解决整个“集群”问题,请告诉我。
欢呼声。
答案 0 :(得分:0)
我终于找到了解决问题的答案。不幸的是,大型数据集的性能非常差。所以,如果有人有更好的方法,请告诉我。
--step 1
--if there are more properties just adapt 'or-clauses'
with edges as (
--determines all (non transitive) edges
select ct1.id as edge1, ct2.id as edge2
from clustertest ct1, clustertest ct2
where ct1.id != ct2.id and ct1.id < ct2.id and (ct1.prop1 = ct2.prop1 or ct1.prop2 = ct2.prop2 or ct1.prop3 = ct2.prop3 /* add further properties here*/)),
-- step 2
tc(edge1, edge2) as (
-- calculates the transitive closure
-- Anchor member.
select edge1, edge2
from edges
union
-- Recursive member.
select e1.edge1, e2.edge2
from edges e1, edges e2
where e2.edge1 = e1.edge2),
-- still step 2
correctEdges as(
--removes unneccessary edges
select edge1, edge2 from tc
where edge1 not in (select tc1.edge1
from tc tc1, tc tc2
where tc1.edge1 = tc2.edge2
group by tc1.edge1, tc1.edge2)
or edge2 not in (select tc1.edge2
from tc tc1, tc tc2
where tc1.edge1 = tc2.edge2
group by tc1.edge1, tc1.edge2))
-- step 4
-- group by edge1 and build final clusters
select concat(edge1, concat(',', listagg(edge2, ',') within group (order by edge1, edge2))) as clusterIds
from correctEdges
group by edge1
-- step 5
union
--add 'single' clusters (only one item)
select to_char(id) as clusterIds
from clusterTest
minus
-- step 3
(select to_char(edge1)
from correctEdges
group by edge1
union
select to_char(edge2) as edge1
from correctEdges
group by edge2);
每个步骤的简短说明: