我有一个关系表(一对多),我需要有效地获得ID给它们相关项目之间的相似性。表格如下:
id item
1 A2231
1 A2134
2 A2134
2 B2313
...
我需要的是获取所有ID之间共有多少行:
a_id b_id count_items
1 2 1
1 3 0
2 1 1
...
我已经进行了查询,但是它的o(n2),并且因为假脱机空间而无法正常工作。
SELECT A.ID AS a_id, B.ID AS b_id, COUNT(B.item) AS count_items
FROM Tab AS A LEFT JOIN Tab AS B --same table
ON (A.item = B.item)
GROUP BY A.ID, B.ID
编辑:
n_rows ~ 50MM
n_items ~ 100K
n_ids ~ 170K
combinations id/item are unique
有没有办法有效地实现这一目标? 提前谢谢!
答案 0 :(得分:0)
我首先要使用内连接:
SELECT A.ID, B.ID, COUNT(*) AS count_items
FROM Tab A LEFT JOIN
Tab B --same table
ON A.item = B.item
GROUP BY A.ID, B.ID;
接下来,如果您的表有重复项,那么这可能有效:
with t as (
select distinct id, item
from tab
)
select a.id, b.id, count(*)
from t a join
t b
on a.item = b.item
group by a.id, b.id;
最后,如果你想要所有项目对,那么:
with t as (
select distinct id, item
from tab
)
select i1.id, i2.id, count(b.id)
from (select distinct id from tab) i1 cross join
(select distinct id from tab) i2 left join
t a
on t.id = i1.id left join
t b
on b.id = i2.id and a.item = b.item
group by i1.id, i2.id;