经过多年的阅读答案,终于可以自己问一个问题了。
我有一份购买的产品清单和唯一的客户ID:
+---------+--------+
| Product | Buyer |
+---------+--------+
| Apples | Rod |
| Apples | Jane |
| Apples | Freddy |
| Bananas | Rod |
| Bananas | Jane |
| Bananas | Freddy |
| Bananas | Zippy |
| Pears | Rod |
| Pears | Zippy |
+---------+--------+
我想在Netezza SQL中产生以下输出:
+-----------+-------------+------------------------+---------------------+
| Product A | Buyers of A | A Buyers Also Bought B | No of A Buyers of B |
+-----------+-------------+------------------------+---------------------+
| Apples | 3 | Bananas | 3 |
| Apples | 3 | Pears | 1 |
| Bananas | 4 | Apples | 3 |
| Bananas | 4 | Pears | 2 |
| Pears | 2 | Apples | 1 |
| Pears | 2 | Bananas | 2 |
+-----------+-------------+------------------------+---------------------+
..以便可以看到每种产品的总购买者。至关重要的是,我还想查看每个购买者的产品,在同一清单中有多少购买了其他产品。 编辑:重要的是要重申,如果他们不也购买产品A,则我不应在B列中出现任何买家。
请问最有效的方法是什么?
(然后,我将计算出一定比例的B购买A,但这很容易。)
谢谢!
答案 0 :(得分:0)
您可以创建计数摘要,然后与自身进行交叉联接,但不包括相同的匹配项。
赞:
SELECT
A.Product,
A.Buyers,
B.Product,
B.Buyers
FROM (
SELECT
Product
count(*) AS Buyers
FROM
ProductBuyers
GROUP BY
) AS A
CROSS JOIN (
SELECT
Product
count(*) AS Buyers
FROM
ProductBuyers
GROUP BY
) AS B
WHERE
A.Product != B.Product
答案 1 :(得分:0)
关于共同购买的基本数据是自我结合和group by
:
select p1.product, p2.product, count(*) as in_common
from purchases p1 join
purchases p2
on p1.buyer = p2.buyer
group by p1.product, p2.product;
要获取一个(或另一个)的计数,则为join
:
select p1.product, p2.product, pp.cnt, count(*) as in_common
from purchases p1 join
purchases p2
on p1.buyer = p2.buyer join
(select p1.product, count(*) as cnt
from purchases
group by p1.product
) pp
on pp.product = p1.product
group by p1.product, p2.product, pp.cnt;
或者,您可以使用窗口功能:
select p1.product, p1.cnt, p2.product, count(*) as in_common
from (select p1.*,
count(*) over (partition by p1.product) as cnt
from purchases p1
) p1 join
purchases p2
on p1.buyer = p2.buyer
group by p1.product, p2.product, p1.cnt;
Here是显示它正常工作的一个月。