嵌套相交

时间:2018-07-18 15:31:56

标签: sql oracle plsql

我有一张已售商品的表格,其中包含customer_id和item_name。 我需要创建一个新表来获取 交叉项目的客户包含 customer_a_id,customer_b_id,intersected_items_count。

我用游标和嵌套的for循环编写了一个PL / SQL过程 执行此操作,但是如果我有一百万个客户怎么办 这意味着1m * 1m循环

我的问题是:嵌套相交是否有任何sql方法(将表中的所有行与所有行相交)

我的桌子是这样的:

customer_id   item
1              Meat 
1              Rice 
2              Meat
2              Soups 
3              Pasta 

要求的输出:

customer_a_id customer_b_id intersected_items
1              2             1
1              3             0
2              1             1
2              3             0
3              1             0
3              2             0

2 个答案:

答案 0 :(得分:1)

我可以使用cross joinleft join来做到这一点:

select c1.customer_id, c2.customer_id, count(t2.item) as num_intersected_items
from (select distinct customer_id from t) c1 cross join
     (select distinct customer_id from t) c2 left join
     t t1
     on t1.customer_id = c1.customer_id left join
     t t2
     on t2.customer_id = c2.customer_id and t2.item = t1.item and
where c1.customer_id <> c2.customer_id
group by c1.customer_id, c2.customer_id;

此版本可让您控制客户ID,它们可能来自不同的表格,其中包括没有商品的客户。

如果所有项目都来自同一表,则结果等于left join

select t1.customer_id, t2.customer_id, count(t2.item) as num_intersected_items
from t t1 left join
     t t2
     on t1.item = t2.item 
where t1.customer_id <> t2.customer_id
group by c1.customer_id, c2.customer_id;

答案 1 :(得分:-1)

客户表的自联接将产生所需的结果集

    SELECT c1.id        customer_a_id 
         , c2.id        customer_b_id
         , COUNT(*)     intersected_items
      FROM customer c1
      JOIN customer c2
        ON (
                 c1.id <> c2.id
             AND c1.item = c2.item
           )
  GROUP BY c1.id
         , c2.id
         ;

c1.id < c2.id有明显的优化。

补充

如@JuanCarlosOropeza所述,以上解决方案不包含具有不相交项目集的ID对。考虑到引用的表大小为10 ^ 6,这是设计使然。

但是,为了完整起见,并确认OP并没有要求跳过这些配对,以下查询也会生成不相交的项:

    SELECT x.customer_a_id
         , x.customer_b_id
         , COALESCE(matches.intersected_items, 0)   intersected_items
      FROM (
                SELECT c_all_1.id        customer_a_id 
                     , c_all_2.id        customer_b_id
                  FROM customer c_all_1
            CROSS JOIN customer c_all_2
                 WHERE c_all_1.id < c_all_2.id
              GROUP BY c_all_1.id
                     , c_all_2.id
           ) x
 LEFT JOIN (
                SELECT c1.id        customer_a_id 
                     , c2.id        customer_b_id
                     , COUNT(*)     intersected_items
                  FROM customer c1
                  JOIN customer c2
                    ON (
                             c1.id < c2.id
                         AND c1.item = c2.item
                       )
              GROUP BY c1.id
                     , c2.id
           ) matches
        ON (
                matches.customer_a_id = x.customer_a_id
            AND matches.customer_b_id = x.customer_b_id
           )
  ORDER BY intersected_items desc
         , customer_a_id 
         , customer_b_id 
         ;