我对SQL很新(目前使用的是PostgreSQL,但对任何SQL的知识感兴趣),并且我试图想出一些我认为应该相对简单的东西。
我有一个表,每个客户交易包含一行,对于我知道客户购买的每笔交易。我有兴趣找出每个客户首选的产品,然后是他们的第二选择(最后,一般来说,当首选项不可用时,首选的第二选择是什么)。
下面是对数据外观的模拟:
NSMutableArray
成功的结果将是这样的:
.container{
background-image: url("Your image address");
background-size: 100% 100%;
background-repeat: no-repeat;
background-position: center;
}
(如前所述,最终结果(最有可能在Python / R中而不是在SQL中完成)将会看到一般基础为“如果首选#1是DVD,那么首选#2是蓝光” ,“如果首选#1是蓝光,则首选#2是三明治”......等等)
干杯
答案 0 :(得分:1)
这是greatest-n-per-group和pivot问题的组合(有时也称为crosstab)
您需要做的第一步是确定两种首选产品。
在您的情况下,您需要将LPAREN
查询与窗口函数组合在一起。
以下查询计算每位客户购买每种产品的频率:
group by
这可以增强到包括产品购买次数的等级:
select customer_id,
product_bought,
count(*) as num_products
from sales
group by customer_id, product_bought
order by customer_id;
这将返回以下结果(基于您的示例数据):
select customer_id,
product_bought,
count(*) as num_products,
dense_rank() over (partition by customer_id order by count(*) desc) as rnk
from sales
group by customer_id, product_bought
order by customer_id;
我们无法直接在customer_id | product_bought | num_products | rnk
------------+----------------+--------------+----
1 | DVD | 3 | 1
1 | Blu-ray | 1 | 2
2 | DVD | 2 | 1
列上应用where条件,因此我们需要一个派生表:
rnk
现在我们需要将每个客户的两行转换为列。这可以是例如使用公用表表达式完成:
select customer_id, product_bought
from (
select customer_id,
product_bought,
count(*) as num_products,
dense_rank() over (partition by customer_id order by count(*) desc) as rnk
from sales
group by customer_id, product_bought
) t
where rnk <= 2
order by customer_id;
然后返回
with preferred_products as (
select *
from (
select customer_id,
product_bought,
count(*) as num_products,
dense_rank() over (partition by customer_id order by count(*) desc) as rnk
from sales
group by customer_id, product_bought
) t
where rnk <= 2
)
select p1.customer_id,
p1.product_bought as "Product #1",
p2.product_bought as "Product #2"
from preferred_products p1
left join preferred_products p2 on p1.customer_id = p2.customer_id and p2.rnk = 2
where p1.rnk = 1
以上是标准SQL,适用于任何现代DBMS。