以下查询(已通过Postgresql 11.1测试)针对每个客户/产品组合评估以下元素:
然后将A / B除以得出称为loyalty
的指标。
select
pp.customer, pp.product, pp.category,
pp.sales_product / pc.sales_category as loyalty
from (
select
t.household_key as customer,
t.product_id as product,
p.commodity as category,
sum(t.sales_value) as sales_product
from transaction_data t
left join product p on p.product_id = t.product_id
group by t.household_key, t.product_id, p.commodity
) pp
left join (
select
t.household_key as customer,
p.commodity as category,
sum(t.sales_value) as sales_category
from transaction_data t
left join product p on p.product_id = t.product_id
group by t.household_key, p.commodity
) pc on pp.customer = pc.customer and pp.category = pc.category
;
结果具有以下形式:
customer product category loyalty
---------------------------------------------
1 tomato food 0.01
1 beef food 0.02
1 toothpaste hygiene 0.04
1 toothbrush hygiene 0.03
我的问题是,不必依赖于两个子查询然后将它们左联接,那么使用窗口函数代替单个查询是否可行?
我已经尝试执行以下操作,但是显然这是行不通的,因为在这种情况下,column "t.sales_value" must appear in the GROUP BY clause or be used in an aggregate function
。我看不出该如何解决。
-- does not work
select
t.household_key as customer,
t.product_id as product,
p.commodity as category,
sum(t.sales_value) as sales_product,
sum(t.sales_value) over (partition by t.household_key, p.commodity) as sales_category
from transaction_data t
left join product p on p.product_id = t.product_id
group by t.household_key, t.product_id, p.commodity;
答案 0 :(得分:1)
我不知道如何在不使用联接或子查询的情况下执行此操作,但这是使用解析函数通过子查询执行此操作的一种方法:
WITH cte AS (
SELECT
t.household_key AS customer,
t.product_id AS product,
p.commodity as category,
SUM(t.sales_value) OVER (PARTITION BY t.household_key, t.product_id, p.commodity)
AS sales_product,
SUM(t.sales_value) OVER (PARTITION BY t.household_key, p.commodity)
AS sales_category
FROM transaction_data t
LEFT JOIN product p
ON p.product_id = t.product_id
)
SELECT
t.customer,
t.product,
t.category
MAX(t.sales_product) / MAX(t.sales_category) AS loyalty
FROM cte
GROUP BY
t.customer,
t.product,
t.category;
这里的窍门是对连接的表进行一次遍历,并使用解析和来计算所需的聚合,该聚合具有两个不同的分区,一个分区有2列,另一个分区有3列。然后,我们可以按3列进行汇总,并且可以任意取每个组的汇总最大值。