使用窗口函数计算滚动计数

时间:2019-07-02 01:26:06

标签: sql postgresql window-functions gaps-and-islands

我有一个表,其中包含我们客户的订单: order_date:订单日期(这不是唯一的,因为每个订单可能有多个产品) customer_id:这不是唯一的

我想计算直到当前order_date的每个客户的订单数量,但是由于order_date中有重复项,因此结果不合理。

我在Postgres 11.2中使用窗口函数

CREATE TABLE "public"."orders" (
    "order_date" timestamp,
    "customer_id" integer
);

插入数据:

INSERT INTO "public"."orders"("order_date", "customer_id") VALUES('2018-12-13 20:45:24.571964', 402) RETURNING "order_date", "customer_id";
INSERT INTO "public"."orders"("order_date", "customer_id") VALUES('2018-12-13 20:45:24.571964', 402) RETURNING "order_date", "customer_id";
INSERT INTO "public"."orders"("order_date", "customer_id") VALUES('2018-10-12 20:08:39.635959', 466) RETURNING "order_date", "customer_id";
INSERT INTO "public"."orders"("order_date", "customer_id") VALUES('2018-11-04 22:15:14.905851', 483) RETURNING "order_date", "customer_id";
INSERT INTO "public"."orders"("order_date", "customer_id") VALUES('2018-11-04 22:15:14.905851', 483) RETURNING "order_date", "customer_id";
INSERT INTO "public"."orders"("order_date", "customer_id") 

我使用此代码生成了我想要的东西,但它不起作用

select *,COALESCE(COUNT(*) OVER (partition by orders.customer_id order by orders.order_date range between interval '100 years' PRECEDING AND 
       CURRENT ROW EXCLUDE CURRENT ROW),0) AS 
       customer_orders_count_up_to_now,
       COALESCE(COUNT(*) OVER (partition by orders.customer_id order by 
       orders.order_date asc range BETWEEN interval '7 days' PRECEDING 
       AND CURRENT ROW EXCLUDE CURRENT ROW),0) AS 
       customer_orders_last_seven_days 
from orders

我希望customer_orders_count_up_to_now和customer_orders_last_seven_days的输出列为0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0 但是,由于order_date重复,实际情况有所不同。

2 个答案:

答案 0 :(得分:0)

如果我的理解正确,那么您基本上希望count(distinct)作为窗口函数。 Postgres尚不支持。但是您可以在子查询中使用select distinct

select o.*,
       COALESCE(COUNT(*) OVER (partition by o.customer_id
                               order by o.order_date
                               range between interval '100 years' PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW),
                0) AS customer_orders_count_up_to_now,
       COALESCE(COUNT(*) OVER (partition by o.customer_id
                               order by o.order_date asc
                               range BETWEEN interval '7 days' PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW),
                0) AS customer_orders_last_seven_days 
from (SELECT DISTINCT o.customer_id, o.order_date from orders o) o

答案 1 :(得分:0)

我找到了解决方案,如果其他人有相同的问题,我在这里分享:

select *,COALESCE(COUNT(*) OVER (partition by orders.customer_id order by orders.order_date range between interval '100 years' PRECEDING AND 
   CURRENT ROW EXCLUDE CURRENT ROW),0) - COALESCE(COUNT(*) OVER (partition by orders.customer_id,orders.order_date order by orders.order_date range between interval '100 years' PRECEDING AND 
   CURRENT ROW EXCLUDE CURRENT ROW),0) AS 
   customer_orders_count_up_to_now,
   COALESCE(COUNT(*) OVER (partition by orders.customer_id order by 
   orders.order_date asc range BETWEEN interval '7 days' PRECEDING 
   AND CURRENT ROW EXCLUDE CURRENT ROW),0) - COALESCE(COUNT(*) OVER (partition by orders.customer_id order,orders.order_date by 
   orders.order_date asc range BETWEEN interval '7 days' PRECEDING 
   AND CURRENT ROW EXCLUDE CURRENT ROW),0) AS 
   customer_orders_last_seven_days from orders

这个想法是,为了从滚动计数中删除重复计数,我们应该从计算的滚动计数中减去在order_time中有重复的那些记录的计数。