假设一个表格如下:
c1 c2 == == a 1 a 2 b 1 b 2 b 3 c 4 c 2
我们将此表分组为c1,并有三个组a,b,c。 我需要计算两组之间列c2的相似度,如下所示:
sim(a,b) = 2(common value of c2 are 1 and 2)/3(all value)=2/3 sim(b,c) = 1(b and c has only one value 2 in common)/4 = 1/4 sim(a,c) = 1/3
我们可以使用sql(首先是Oracle 11g语法)来构造上面的表达式吗?
答案 0 :(得分:0)
我相信这个查询会做你想要的:
select t1.c1, t2.c1, count(*) as NumInCommon,
(select count(distinct t.c2)
from t
where t.c1 in (t1.c1, t2.c1)
) as NumInTotal
from t t1 join
t t2
on t1.c2 = t2.c2
group by t1.c1, t2.c1