如何通过具有having子句的聚合函数选择相应的记录

时间:2018-09-21 21:44:21

标签: postgresql greatest-n-per-group

假设我有一个包含customer_id,order_total和order_date列的订单表。我想构建一个报告,以显示最近30天内未下订单的所有客户,并在该列中显示上次订单的总金额。

这将使所有应包含在报告中的客户:

select customer, max(order_date), (select order_total from orders o2 where o2.customer = orders.customer order by order_date desc limit 1)
from orders
group by 1
having max(order_date) < NOW() - '30 days'::interval

是否有更好的方法来执行此操作,而不需要子查询,而是使用窗口函数或其他更有效的方法来访问最新订单中的总金额? How to select id with max date group by category in PostgreSQL?的技术是相关的,但是额外的having限制似乎使我无法使用类似DISTINCT ON的东西。

2 个答案:

答案 0 :(得分:1)

demo:db<>fiddle


使用row_number窗口函数(https://www.postgresql.org/docs/current/static/tutorial-window.html)的解决方案

SELECT 
    customer, order_date, order_total
FROM (
    SELECT
        *, 
        first_value(order_date) OVER w as last_order, 
        first_value(order_total) OVER w as last_total,
        row_number() OVER w as row_count
    FROM orders
    WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
) s
WHERE row_count = 1 AND order_date < CURRENT_DATE - 30

使用DISTINCT ONhttps://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT)的解决方案:

SELECT
    customer, order_date, order_total
FROM (
    SELECT DISTINCT ON (customer)
        *, 
        first_value(order_date) OVER w as last_order, 
        first_value(order_total) OVER w as last_total
    FROM orders
    WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
    ORDER BY customer, order_date DESC
) s
WHERE order_date < CURRENT_DATE - 30

说明:

在两种解决方案中,我都使用first_value窗口函数。窗口功能的框架由客户定义。客户组中的行按日期降序排列,这是最先显示最新行(last_value is not working as expected every time)。这样就有可能获得此订单的最后order_date和最后order_total

两种解决方案之间的区别在于过滤。我展示了这两个版本,因为有时其中一个版本的速度明显快

窗口函数样式正在框架中创建行数。每个第一行都可以稍后过滤。这可以通过添加row_number窗口函数来完成。当您尝试过滤前两个或三个数据集时,此解决方案的好处就会显现出来。您只需要将过滤器从WHERE row_count = 1更改为WHERE row_count = 2

但是,如果每个组只需要一行,则只需确保将每个组的预期行排序为该组的第一行。然后DISTINCT ON函数可以删除以下所有行。 DISTINCT ON (customer)给出customer组中的第一行(有序)。

答案 1 :(得分:0)

尝试自行加入表格

select o1.customer, max(order_date),
from orders o1
join orders o2 on o1.id=o2.id
group by o1.customer
having max(o1.order_date) < NOW() - '30 days'::interval

select中的子查询不是一个好主意,因为数据库将对每一行执行查询

如果您使用postgres,也可以尝试使用CTE

https://www.postgresql.org/docs/9.6/static/queries-with.html

WITH t as (
select id, order_total from orders o2 where o2.customer = orders.customer 
order by order_date desc limit 1
) select o1.customer, max(order_date),
from orders o1
join t t.id=o2.id
group by o1.customer
having max(order_date) < NOW() - '30 days'::interval