Question

这是一个示例数据集：

    | user_id | product_id | dt       | quantity | price
    | 1       | a          |2017-05-20| 2        | 3.95
    | 1       | b          |2017-06-02| 7        | 19.95
    | 2       | a          |2017-06-23| 4        | 5.99
    | 2       | b          |2017-04-03| 2        | 19.95
    | 2       | c          |2017-06-08| 1        | 9.99
    | 3       | a          |2017-07-02| 4        | 4.98
    | 3       | c          |2017-06-05| 3        | 18.95

提供一个返回项目对的SQL查询（即item_id s对），并计算至少订购过该项目的用户数量（为简单起见，我们赢了＆t; t获取订单的频率或购买的商品数量 - 只要用户是否购买了特定商品。对于上面的示例数据，输出应为：

    | item_id_1 | item_id_2 | num_users |
    | a         | b         | 2         |
    | a         | c         | 2         |
    | b         | c         | 1         |

Answer 1

您可以使用自联接来执行此操作：

select e.product_id, e2.product_id as product_id_2,
       count(distinct e.user_id) as num_users
from example e join
     example e2
     on e.user_id = e2.user_id
group by e.product_id, e2.product_id
order by num_users desc;

Answer 2

select a.product_id as item_id_1, b.product_id as item_id_2, COUNT(*) num_users 
from orders a 
join orders b 
on a.user_id = b.user_id and a.product_id < b.product_id 
group by a.product_id, b.product_id 
order by num_users desc;

Answer 3

假设用户可以多次订购同一产品，最好先对用户和产品进行分组。

然后，这两个分组结果将在同一个user_id和另一个product_id上连接在这种情况下，product_id较低，因为我们只想要f.e.组合＆＃39; a＆＃39; ＆安培; ＆＃39; B＆＃39;而不是它的反向组合＆＃39; b＆＃39; ＆安培; ＆＃39;一个＆＃39;

之后，只需要将其与计数分组。

select 
 t1.product_id as item_id_1, 
 t2.product_id as item_id_2, 
 count(t1.user_id) as num_users
from 
(
    select user_id, product_id 
    from YourTable
    group by user_id, product_id
) t1
join (
    select user_id, product_id 
    from YourTable 
    group by user_id, product_id
) t2 on (t1.user_id = t2.user_id and t1.product_id < t2.product_id)
group by t1.product_id, t2.product_id
order by t1.product_id, t2.product_id

如果您的数据库支持WITH子句，那么您可以将相同的子查询放在公用表表达式中并重新使用它。

WITH CTE as (
    select user_id, product_id 
    from YourTable
    group by user_id, product_id
)
select 
 t1.product_id as item_id_1, 
 t2.product_id as item_id_2, 
 count(t1.user_id) as num_users
from CTE t1
join CTE t2 on (t1.user_id = t2.user_id and t1.product_id < t2.product_id)
group by t1.product_id, t2.product_id
order by t1.product_id, t2.product_id

基于用户标识的产品项的SQL关联

3 个答案: