PostgreSQL:客户首选产品和第二优选产品

时间:2017-05-16 05:09:29

标签: sql postgresql greatest-n-per-group

我对SQL很新(目前使用的是PostgreSQL,但对任何SQL的知识感兴趣),并且我试图想出一些我认为应该相对简单的东西。

我有一个表,每个客户交易包含一行,对于我知道客户购买的每笔交易。我有兴趣找出每个客户首选的产品,然后是他们的第二选择(最后,一般来说,当首选项不可用时,首选的第二选择是什么)。

下面是对数据外观的模拟:

NSMutableArray

成功的结果将是这样的:

.container{
background-image: url("Your image address");
background-size: 100% 100%;
background-repeat: no-repeat;
background-position: center;
}

(如前所述,最终结果(最有可能在Python / R中而不是在SQL中完成)将会看到一般基础为“如果首选#1是DVD,那么首选#2是蓝光” ,“如果首选#1是蓝光,则首选#2是三明治”......等等)

干杯

1 个答案:

答案 0 :(得分:1)

这是问题的组合(有时也称为

您需要做的第一步是确定两种首选产品。

在您的情况下,您需要将LPAREN查询与窗口函数组合在一起。

以下查询计算每位客户购买每种产品的频率:

group by

这可以增强到包括产品购买次数的等级:

select customer_id, 
       product_bought,
       count(*) as num_products
from sales
group by customer_id, product_bought
order by customer_id;

这将返回以下结果(基于您的示例数据):

select customer_id, 
       product_bought,
       count(*) as num_products,
       dense_rank() over (partition by customer_id order by count(*) desc) as rnk
from sales
group by customer_id, product_bought
order by customer_id;

我们无法直接在customer_id | product_bought | num_products | rnk ------------+----------------+--------------+---- 1 | DVD | 3 | 1 1 | Blu-ray | 1 | 2 2 | DVD | 2 | 1 列上应用where条件,因此我们需要一个派生表:

rnk

现在我们需要将每个客户的两行转换为列。这可以是例如使用公用表表达式完成:

select customer_id, product_bought
from (
  select customer_id, 
         product_bought,
         count(*) as num_products,
         dense_rank() over (partition by customer_id order by count(*) desc) as rnk
  from sales
  group by customer_id, product_bought
) t
where rnk <= 2
order by customer_id;

然后返回

with preferred_products as (
  select *
  from (
    select customer_id, 
           product_bought,
           count(*) as num_products,
           dense_rank() over (partition by customer_id order by count(*) desc) as rnk
    from sales
    group by customer_id, product_bought
  ) t
  where rnk <= 2
)
select p1.customer_id, 
       p1.product_bought as "Product #1", 
       p2.product_bought as "Product #2"
from preferred_products p1 
  left join preferred_products p2 on p1.customer_id = p2.customer_id and p2.rnk = 2
where p1.rnk = 1

以上是标准SQL,适用于任何现代DBMS。

在线示例:http://rextester.com/VAID15638