基于对一对多表中的列进行计数,有效地获得最近的id

时间:2017-07-25 16:41:39

标签: sql join count teradata relational

我有一个关系表(一对多),我需要有效地获得ID给它们相关项目之间的相似性。表格如下:

id  item 
1   A2231
1   A2134
2   A2134
2   B2313
... 

我需要的是获取所有ID之间共有多少行:

a_id  b_id  count_items
1     2     1 
1     3     0
2     1     1 
...

我已经进行了查询,但是它的o(n2),并且因为假脱机空间而无法正常工作。

SELECT A.ID AS a_id, B.ID AS b_id, COUNT(B.item) AS count_items
FROM Tab AS A LEFT JOIN  Tab AS B --same table
ON (A.item = B.item)
GROUP BY A.ID, B.ID
  

编辑:

n_rows ~ 50MM 
n_items ~ 100K 
n_ids ~ 170K
combinations id/item are unique

有没有办法有效地实现这一目标? 提前谢谢!

1 个答案:

答案 0 :(得分:0)

我首先要使用内连接:

SELECT A.ID, B.ID, COUNT(*) AS count_items
FROM Tab A LEFT JOIN
     Tab B --same table
     ON A.item = B.item
GROUP BY A.ID, B.ID;

接下来,如果您的表有重复项,那么这可能有效:

with t as (
      select distinct id, item
      from tab
     )
select a.id, b.id, count(*)
from t a join
     t b
     on a.item = b.item
group by a.id, b.id;

最后,如果你想要所有项目对,那么:

with t as (
      select distinct id, item
      from tab
     )
select i1.id, i2.id, count(b.id)
from (select distinct id from tab) i1 cross join
     (select distinct id from tab) i2 left join
     t a
     on t.id = i1.id left join
     t b
     on b.id = i2.id and a.item = b.item
group by i1.id, i2.id;