这是一个示例数据集:
| user_id | product_id | dt | quantity | price
| 1 | a |2017-05-20| 2 | 3.95
| 1 | b |2017-06-02| 7 | 19.95
| 2 | a |2017-06-23| 4 | 5.99
| 2 | b |2017-04-03| 2 | 19.95
| 2 | c |2017-06-08| 1 | 9.99
| 3 | a |2017-07-02| 4 | 4.98
| 3 | c |2017-06-05| 3 | 18.95
提供一个返回项目对的SQL查询(即item_id
s对),并计算至少订购过该项目的用户数量(为简单起见,我们赢了&t; t获取订单的频率或购买的商品数量 - 只要用户是否购买了特定商品。对于上面的示例数据,输出应为:
| item_id_1 | item_id_2 | num_users |
| a | b | 2 |
| a | c | 2 |
| b | c | 1 |
答案 0 :(得分:0)
您可以使用自联接来执行此操作:
select e.product_id, e2.product_id as product_id_2,
count(distinct e.user_id) as num_users
from example e join
example e2
on e.user_id = e2.user_id
group by e.product_id, e2.product_id
order by num_users desc;
答案 1 :(得分:0)
select a.product_id as item_id_1, b.product_id as item_id_2, COUNT(*) num_users
from orders a
join orders b
on a.user_id = b.user_id and a.product_id < b.product_id
group by a.product_id, b.product_id
order by num_users desc;
答案 2 :(得分:0)
假设用户可以多次订购同一产品,最好先对用户和产品进行分组。
然后,这两个分组结果将在同一个user_id和另一个product_id上连接 在这种情况下,product_id较低,因为我们只想要f.e.组合&#39; a&#39; &安培; &#39; B&#39;而不是它的反向组合&#39; b&#39; &安培; &#39;一个&#39;
之后,只需要将其与计数分组。
select
t1.product_id as item_id_1,
t2.product_id as item_id_2,
count(t1.user_id) as num_users
from
(
select user_id, product_id
from YourTable
group by user_id, product_id
) t1
join (
select user_id, product_id
from YourTable
group by user_id, product_id
) t2 on (t1.user_id = t2.user_id and t1.product_id < t2.product_id)
group by t1.product_id, t2.product_id
order by t1.product_id, t2.product_id
如果您的数据库支持WITH子句,那么您可以将相同的子查询放在公用表表达式中并重新使用它。
WITH CTE as (
select user_id, product_id
from YourTable
group by user_id, product_id
)
select
t1.product_id as item_id_1,
t2.product_id as item_id_2,
count(t1.user_id) as num_users
from CTE t1
join CTE t2 on (t1.user_id = t2.user_id and t1.product_id < t2.product_id)
group by t1.product_id, t2.product_id
order by t1.product_id, t2.product_id