SQL列出购买的其他产品,并按最初购买的产品计数购买者

时间:2018-09-01 23:30:09

标签: sql netezza market-basket-analysis

经过多年的阅读答案,终于可以自己问一个问题了。

我有一份购买的产品清单和唯一的客户ID:

+---------+--------+
| Product | Buyer  |
+---------+--------+
| Apples  | Rod    |
| Apples  | Jane   |
| Apples  | Freddy |
| Bananas | Rod    |
| Bananas | Jane   |
| Bananas | Freddy |
| Bananas | Zippy  |
| Pears   | Rod    |
| Pears   | Zippy  |
+---------+--------+

我想在Netezza SQL中产生以下输出:

+-----------+-------------+------------------------+---------------------+
| Product A | Buyers of A | A Buyers Also Bought B | No of A Buyers of B |
+-----------+-------------+------------------------+---------------------+
| Apples    |           3 | Bananas                |                   3 |
| Apples    |           3 | Pears                  |                   1 |
| Bananas   |           4 | Apples                 |                   3 |
| Bananas   |           4 | Pears                  |                   2 |
| Pears     |           2 | Apples                 |                   1 |
| Pears     |           2 | Bananas                |                   2 |
+-----------+-------------+------------------------+---------------------+

..以便可以看到每种产品的总购买者。至关重要的是,我还想查看每个购买者的产品,在同一清单中有多少购买了其他产品。 编辑:重要的是要重申,如果他们不购买产品A,则我不应在B列中出现任何买家。

请问最有效的方法是什么?

(然后,我将计算出一定比例的B购买A,但这很容易。)

谢谢!

2 个答案:

答案 0 :(得分:0)

您可以创建计数摘要,然后与自身进行交叉联接,但不包括相同的匹配项。

赞:

SELECT 
    A.Product,
    A.Buyers,
    B.Product,
    B.Buyers
FROM (
    SELECT
        Product
        count(*) AS Buyers
    FROM
        ProductBuyers
    GROUP BY
) AS A
CROSS JOIN (
    SELECT
        Product
        count(*) AS Buyers
    FROM
        ProductBuyers
    GROUP BY
) AS B
WHERE 
    A.Product != B.Product

答案 1 :(得分:0)

关于共同购买的基本数据是自我结合和group by

select p1.product, p2.product, count(*) as in_common
from purchases p1 join
     purchases p2
     on p1.buyer = p2.buyer
group by p1.product, p2.product;

要获取一个(或另一个)的计数,则为join

select p1.product, p2.product, pp.cnt, count(*) as in_common
from purchases p1 join
     purchases p2
     on p1.buyer = p2.buyer join
     (select p1.product, count(*) as cnt
      from purchases
      group by p1.product
     ) pp
     on pp.product = p1.product
group by p1.product, p2.product, pp.cnt;

或者,您可以使用窗口功能:

select p1.product, p1.cnt, p2.product, count(*) as in_common
from (select p1.*,
             count(*) over (partition by p1.product) as cnt
      from purchases p1
     ) p1 join
     purchases p2
     on p1.buyer = p2.buyer
group by p1.product, p2.product, p1.cnt;

Here是显示它正常工作的一个月。

相关问题