我正在尝试从数据库中找到不同的订单号列表。数据库的布局类似于以下内容:
orderNumber customer createDate
---------------------------------------
001 123 2019-01-01
002 123 2019-01-23
003 456 2019-03-12
003R1 456 2019-03-22
004 456 2019-04-25
005 789 2019-05-21
005A1 789 2019-06-30
005R1 789 2019-07-12
R1是订单的退款,而A1是该订单的调整。
我要查找的结果集是不同订单号的列表,其中仅包含该订单号的最新版本。示例:
Results
orderNumber customer CreateDate
----------------------------------------
001 123 2019-01-01
002 123 2019-01-23
003R1 456 2019-03-22
004 456 2019-04-25
005R1 789 2019-07-12
我尝试了几种不同的方法,但是它们要么返回每次退款或调整额,要么不显示其他不同的订单号
select customer, orderNumber, createDate
from (select customer
,orderNumber
,createDate
,ROW_NUMBER() over(partition by customer order by createDate desc) as RowNum
from orders) as T
where RowNum = 1
上述查询的结果如下:
orderNumber customer createDate
---------------------------------------
002 123 2019-01-23
003R1 456 2019-03-22
005R1 789 2019-07-12
是否可以获取每个订单号以及最新版本的“重复”订单号?
答案 0 :(得分:1)
您几乎可以理解。您需要按标准化的orderNumber进行分区,并去除标记。在这里,我使用的是left
和substring
,但是您应该使用更健壮的模式匹配来获取第一个数字。
select
left(orderNumber, 3) as cleanOrderNumber,
substring(orderNumber, 4, 5) as flag,
customer,
createDate,
row_number() over (partition by left(orderNumber, 3) order by createDate desc) as RowNum
from orders;
cleanOrderNumber flag customer createDate RowNum
001 123 01/01/2019 00:00:00 1
002 123 23/01/2019 00:00:00 1
003 R1 456 22/03/2019 00:00:00 1
003 456 12/03/2019 00:00:00 2
004 456 25/04/2019 00:00:00 1
005 R1 789 12/07/2019 00:00:00 1
005 A1 789 30/06/2019 00:00:00 2
005 789 21/05/2019 00:00:00 3
然后我们可以将其用作CTE并获取每个orderNumber的第一行。
with cleanOrders as (
select
left(orderNumber, 3) as cleanOrderNumber,
substring(orderNumber, 4, 5) as flag,
customer,
createDate,
row_number() over (partition by left(orderNumber, 3) order by createDate desc) as RowNum
from orders
)
select *
from cleanOrders
where rowNum = 1;
cleanOrderNumber flag customer createDate RowNum
001 123 01/01/2019 00:00:00 1
002 123 23/01/2019 00:00:00 1
003 R1 456 22/03/2019 00:00:00 1
004 456 25/04/2019 00:00:00 1
005 R1 789 12/07/2019 00:00:00 1
请注意,如果表中已经将orderNumber
和flag
分开,则将更加容易和快捷(可以将它们编入索引)。您可以添加新列,填充它们,并将orderNumber
保留为旧列。
答案 1 :(得分:0)
我的工作方式如下。 用于DDL和示例数据。
CREATE TABLE orders
([orderNumber] varchar(5), [customer] int, [createDate] datetime)
;
INSERT INTO orders
([orderNumber], [customer], [createDate])
VALUES
('001', 123, '2019-01-01 00:00:00'),
('002', 123, '2019-01-23 00:00:00'),
('003', 456, '2019-03-12 00:00:00'),
('003R1', 456, '2019-03-22 00:00:00'),
('004', 456, '2019-04-25 00:00:00'),
('005', 789, '2019-05-21 00:00:00'),
('005A1', 789, '2019-06-30 00:00:00'),
('005R1', 789, '2019-07-12 00:00:00')
;
这是实际的查询:
WITH ordersCTE AS (
SELECT *, PATINDEX('%[a-z]%', orderNumber) AS firstLetterPosition
FROM orders AS O)
, ComputedOrderNumber AS (
SELECT *, CAST(CASE firstLetterPosition
WHEN 0 THEN orderNumber
ELSE LEFT(orderNumber, firstLetterPosition -1)
END AS INT) as actualOrderNumber
FROM ordersCTE)
, Ordered AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY actualOrderNumber ORDER BY createDate DESC) AS RN
FROM ComputedOrderNumber
)
SELECT *
FROM Ordered
WHERE RN = 1
这将尝试先提取身份,然后再忽略orderNumber
中的任何字母,然后将其用于分区数据。
但是我强烈建议您有一个适当的列来存储orderNumber
的根,每计算一次将非常昂贵。