查找名称相似的商品的最长时间

时间:2019-10-29 18:17:25

标签: sql sql-server

我正在尝试从数据库中找到不同的订单号列表。数据库的布局类似于以下内容:

orderNumber    customer      createDate
---------------------------------------
001            123            2019-01-01
002            123            2019-01-23
003            456            2019-03-12
003R1          456            2019-03-22
004            456            2019-04-25
005            789            2019-05-21
005A1          789            2019-06-30
005R1          789            2019-07-12

R1是订单的退款,而A1是该订单的调整。

我要查找的结果集是不同订单号的列表,其中仅包含该订单号的最新版本。示例:

Results
orderNumber    customer       CreateDate
----------------------------------------
001            123            2019-01-01
002            123            2019-01-23
003R1          456            2019-03-22
004            456            2019-04-25
005R1          789            2019-07-12

我尝试了几种不同的方法,但是它们要么返回每次退款或调整额,要么不显示其他不同的订单号

select customer, orderNumber, createDate
from (select customer
        ,orderNumber
        ,createDate
        ,ROW_NUMBER() over(partition by customer order by createDate desc) as RowNum
      from orders) as T
where RowNum = 1

上述查询的结果如下:

orderNumber    customer      createDate
---------------------------------------
002            123            2019-01-23
003R1          456            2019-03-22
005R1          789            2019-07-12

是否可以获取每个订单号以及最新版本的“重复”订单号?

2 个答案:

答案 0 :(得分:1)

您几乎可以理解。您需要按标准化的orderNumber进行分区,并去除标记。在这里,我使用的是leftsubstring,但是您应该使用更健壮的模式匹配来获取第一个数字。

select
  left(orderNumber, 3) as cleanOrderNumber,
  substring(orderNumber, 4, 5) as flag,
  customer,
  createDate,
  row_number() over (partition by left(orderNumber, 3) order by createDate desc) as RowNum
from orders;

cleanOrderNumber    flag    customer    createDate              RowNum
001                         123         01/01/2019 00:00:00     1
002                         123         23/01/2019 00:00:00     1
003                 R1      456         22/03/2019 00:00:00     1
003                         456         12/03/2019 00:00:00     2
004                         456         25/04/2019 00:00:00     1
005                 R1      789         12/07/2019 00:00:00     1
005                 A1      789         30/06/2019 00:00:00     2
005                         789         21/05/2019 00:00:00     3

然后我们可以将其用作CTE并获取每个orderNumber的第一行。

with cleanOrders as (
select
  left(orderNumber, 3) as cleanOrderNumber,
  substring(orderNumber, 4, 5) as flag,
  customer,
  createDate,
  row_number() over (partition by left(orderNumber, 3) order by createDate desc) as RowNum
from orders
)
select *
from cleanOrders
where rowNum = 1;

cleanOrderNumber    flag    customer    createDate              RowNum
001                         123         01/01/2019 00:00:00     1
002                         123         23/01/2019 00:00:00     1
003                 R1      456         22/03/2019 00:00:00     1
004                         456         25/04/2019 00:00:00     1
005                 R1      789         12/07/2019 00:00:00     1

请注意,如果表中已经将orderNumberflag分开,则将更加容易和快捷(可以将它们编入索引)。您可以添加新列,填充它们,并将orderNumber保留为旧列。

答案 1 :(得分:0)

我的工作方式如下。 用于DDL和示例数据。

CREATE TABLE orders
    ([orderNumber] varchar(5), [customer] int, [createDate] datetime)
;

INSERT INTO orders
    ([orderNumber], [customer], [createDate])
VALUES
    ('001', 123, '2019-01-01 00:00:00'),
    ('002', 123, '2019-01-23 00:00:00'),
    ('003', 456, '2019-03-12 00:00:00'),
    ('003R1', 456, '2019-03-22 00:00:00'),
    ('004', 456, '2019-04-25 00:00:00'),
    ('005', 789, '2019-05-21 00:00:00'),
    ('005A1', 789, '2019-06-30 00:00:00'),
    ('005R1', 789, '2019-07-12 00:00:00')
;

这是实际的查询:

WITH ordersCTE AS (
  SELECT *, PATINDEX('%[a-z]%', orderNumber) AS firstLetterPosition
  FROM orders AS O)
, ComputedOrderNumber AS (
  SELECT *, CAST(CASE firstLetterPosition
      WHEN 0 THEN orderNumber
      ELSE LEFT(orderNumber, firstLetterPosition -1)
    END AS INT) as actualOrderNumber
  FROM ordersCTE)
, Ordered AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY actualOrderNumber ORDER BY createDate DESC) AS RN
  FROM ComputedOrderNumber
  )
SELECT *
FROM Ordered
WHERE RN = 1

这将尝试先提取身份,然后再忽略orderNumber中的任何字母,然后将其用于分区数据。

但是我强烈建议您有一个适当的列来存储orderNumber的根,每计算一次将非常昂贵。