我有一张已售商品的表格,其中包含customer_id和item_name。 我需要创建一个新表来获取 交叉项目的客户包含 customer_a_id,customer_b_id,intersected_items_count。
我用游标和嵌套的for循环编写了一个PL / SQL过程 执行此操作,但是如果我有一百万个客户怎么办 这意味着1m * 1m循环
我的问题是:嵌套相交是否有任何sql方法(将表中的所有行与所有行相交)
我的桌子是这样的:
customer_id item
1 Meat
1 Rice
2 Meat
2 Soups
3 Pasta
要求的输出:
customer_a_id customer_b_id intersected_items
1 2 1
1 3 0
2 1 1
2 3 0
3 1 0
3 2 0
答案 0 :(得分:1)
我可以使用cross join
和left join
来做到这一点:
select c1.customer_id, c2.customer_id, count(t2.item) as num_intersected_items
from (select distinct customer_id from t) c1 cross join
(select distinct customer_id from t) c2 left join
t t1
on t1.customer_id = c1.customer_id left join
t t2
on t2.customer_id = c2.customer_id and t2.item = t1.item and
where c1.customer_id <> c2.customer_id
group by c1.customer_id, c2.customer_id;
此版本可让您控制客户ID,它们可能来自不同的表格,其中包括没有商品的客户。
如果所有项目都来自同一表,则结果等于left join
:
select t1.customer_id, t2.customer_id, count(t2.item) as num_intersected_items
from t t1 left join
t t2
on t1.item = t2.item
where t1.customer_id <> t2.customer_id
group by c1.customer_id, c2.customer_id;
答案 1 :(得分:-1)
客户表的自联接将产生所需的结果集
SELECT c1.id customer_a_id
, c2.id customer_b_id
, COUNT(*) intersected_items
FROM customer c1
JOIN customer c2
ON (
c1.id <> c2.id
AND c1.item = c2.item
)
GROUP BY c1.id
, c2.id
;
c1.id < c2.id
有明显的优化。
补充
如@JuanCarlosOropeza所述,以上解决方案不包含具有不相交项目集的ID对。考虑到引用的表大小为10 ^ 6,这是设计使然。
但是,为了完整起见,并确认OP并没有要求跳过这些配对,以下查询也会生成不相交的项:
SELECT x.customer_a_id
, x.customer_b_id
, COALESCE(matches.intersected_items, 0) intersected_items
FROM (
SELECT c_all_1.id customer_a_id
, c_all_2.id customer_b_id
FROM customer c_all_1
CROSS JOIN customer c_all_2
WHERE c_all_1.id < c_all_2.id
GROUP BY c_all_1.id
, c_all_2.id
) x
LEFT JOIN (
SELECT c1.id customer_a_id
, c2.id customer_b_id
, COUNT(*) intersected_items
FROM customer c1
JOIN customer c2
ON (
c1.id < c2.id
AND c1.item = c2.item
)
GROUP BY c1.id
, c2.id
) matches
ON (
matches.customer_a_id = x.customer_a_id
AND matches.customer_b_id = x.customer_b_id
)
ORDER BY intersected_items desc
, customer_a_id
, customer_b_id
;