假设我有一个包含customer_id,order_total和order_date列的订单表。我想构建一个报告,以显示最近30天内未下订单的所有客户,并在该列中显示上次订单的总金额。
这将使所有应包含在报告中的客户:
select customer, max(order_date), (select order_total from orders o2 where o2.customer = orders.customer order by order_date desc limit 1)
from orders
group by 1
having max(order_date) < NOW() - '30 days'::interval
是否有更好的方法来执行此操作,而不需要子查询,而是使用窗口函数或其他更有效的方法来访问最新订单中的总金额? How to select id with max date group by category in PostgreSQL?的技术是相关的,但是额外的having
限制似乎使我无法使用类似DISTINCT ON
的东西。
答案 0 :(得分:1)
使用row_number
窗口函数(https://www.postgresql.org/docs/current/static/tutorial-window.html)的解决方案
SELECT
customer, order_date, order_total
FROM (
SELECT
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total,
row_number() OVER w as row_count
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
) s
WHERE row_count = 1 AND order_date < CURRENT_DATE - 30
使用DISTINCT ON
(https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT)的解决方案:
SELECT
customer, order_date, order_total
FROM (
SELECT DISTINCT ON (customer)
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
ORDER BY customer, order_date DESC
) s
WHERE order_date < CURRENT_DATE - 30
说明:
在两种解决方案中,我都使用first_value
窗口函数。窗口功能的框架由客户定义。客户组中的行按日期降序排列,这是最先显示最新行(last_value
is not working as expected every time)。这样就有可能获得此订单的最后order_date
和最后order_total
。
两种解决方案之间的区别在于过滤。我展示了这两个版本,因为有时其中一个版本的速度明显快
窗口函数样式正在框架中创建行数。每个第一行都可以稍后过滤。这可以通过添加row_number
窗口函数来完成。当您尝试过滤前两个或三个数据集时,此解决方案的好处就会显现出来。您只需要将过滤器从WHERE row_count = 1
更改为WHERE row_count = 2
但是,如果每个组只需要一行,则只需确保将每个组的预期行排序为该组的第一行。然后DISTINCT ON
函数可以删除以下所有行。 DISTINCT ON (customer)
给出customer
组中的第一行(有序)。
答案 1 :(得分:0)
尝试自行加入表格
select o1.customer, max(order_date),
from orders o1
join orders o2 on o1.id=o2.id
group by o1.customer
having max(o1.order_date) < NOW() - '30 days'::interval
select中的子查询不是一个好主意,因为数据库将对每一行执行查询
如果您使用postgres,也可以尝试使用CTE
https://www.postgresql.org/docs/9.6/static/queries-with.html
WITH t as (
select id, order_total from orders o2 where o2.customer = orders.customer
order by order_date desc limit 1
) select o1.customer, max(order_date),
from orders o1
join t t.id=o2.id
group by o1.customer
having max(order_date) < NOW() - '30 days'::interval