查找重复订单(按时间接近)

时间:2012-03-02 15:19:07

标签: sql sql-server

我有一张订单表,我知道有重复的

    customer   order_number   order_date
   ----------  ------------   -------------------
          1             1     2012-03-01 01:58:00
          1             2     2012-03-01 02:01:00
          1             3     2012-03-01 02:03:00
          2             4     2012-03-01 02:15:00
          3             5     2012-03-01 02:18:00
          3             6     2012-03-01 04:30:00
          4             7     2012-03-01 04:35:00
          5             8     2012-03-01 04:38:00
          6             9     2012-03-01 04:58:00
          6            10     2012-03-01 04:59:00

我想找到所有重复项(由彼此在60分钟内由同一客户订购)。结果集由“重复”行组成,或者是一组具有重复数量的所有客户。

这是我试过的

SELECT
   customer,
   count(*)
FROM
   orders
GROUP BY
   customer,
   DATEPART(HOUR, order_date)
HAVING (count(*) > 1)

当副本在彼此的60分钟内但在不同的时间内,即1:58和2:02时,这不起作用

我也试过这个

SELECT
  o1.customer,
  o1.order_number,
  o2.order_number,
  DATEDIFF(MINUTE,o1.order_date, o2.order_date) AS [diff]
FROM
  orders o1 LEFT OUTER JOIN
  orders o2 ON o1.customer = o2.customer AND o1.order_number <> o2.order_number
WHERE
  ABS(DATEDIFF(MINUTE,o1.order_date, o2.order_date)) < 60

现在这给了我所有重复项,但它也为每个重复订单提供了多行。即(o1,o2)和(o2,o1),如果没有多个重复的订单,那就不会那么糟糕。在那些情况下,我得到(o1,o2),(o1,o3),(o2,o1),(o2,o3),(o3,o1),(o3,o2)等。我得到了所有的排列。< / p>

任何人都有一些见解?我不一定在这里寻找表现最好的答案,只有一个有效。

3 个答案:

答案 0 :(得分:3)

SELECT
  *,
  CASE WHEN EXISTS (SELECT *
                      FROM orders AS lookup
                     WHERE customer    = orders.customer
                       AND order_date <  orders.order_date
                       AND order_date >= DATEADD(hour, -1, order_date)
                   )
       THEN 'Principle Order'
       ELSE 'Duplicate Order'
  END as Order_Status
FROM
  orders

使用EXISTS和相关的子查询,您可以检查过去一小时内是否有任何先前的订单。

答案 1 :(得分:1)

也许是这样的:

测试数据:

DECLARE @tbl TABLE(customer INT,order_number INT,order_date DATETIME)

INSERT INTO @tbl
VALUES
    (1,1,'2012-03-01 01:58:00'),
    (1,2,'2012-03-01 02:01:00'),
    (1,3,'2012-03-01 02:03:00'),
    (2,4,'2012-03-01 02:15:00'),
    (3,5,'2012-03-01 02:18:00'),
    (3,6,'2012-03-01 04:30:00'),
    (4,7,'2012-03-01 04:35:00'),
    (5,8,'2012-03-01 04:38:00'),
    (6,9,'2012-03-01 04:58:00'),
    (6,10,'2012-03-01 04:59:00')

<强>查询

;WITH CTE
AS
(
    SELECT
        MIN(datediff(minute,'1990-1-1',order_date)) OVER(PARTITION BY customer) AS minDate,
        datediff(minute,'1990-1-1',order_date) AS DateTicks,
        tbl.customer
    FROM
        @tbl AS tbl
)
SELECT
    CTE.customer,
    SUM(CASE WHEN (CTE.DateTicks-CTE.minDate)<60 THEN 1 ELSE 0 END)
FROM
    CTE
GROUP BY
    CTE.customer

答案 2 :(得分:1)

以下查询确定了彼此间隔60分钟内所有可能的订单排列:

DECLARE @orders TABLE (CustomerId INT, OrderId INT, OrderDate DATETIME)

INSERT INTO @orders
VALUES
    (1, 1, '2012-03-01 01:58:00'),
    (1, 2, '2012-03-01 02:01:00'),
    (1, 3, '2012-03-01 02:03:00'),
    (2, 4, '2012-03-01 02:15:00'),
    (3, 5, '2012-03-01 02:18:00'),
    (3, 6, '2012-03-01 04:30:00'),
    (4, 7, '2012-03-01 04:35:00'),
    (5, 8, '2012-03-01 04:38:00'),
    (6, 9, '2012-03-01 04:58:00'),
    (6, 10, '2012-03-01 04:59:00');

with ProximityOrderCascade(CustomerId, OrderId, ProximateOrderId, MinutesDifference, OrderDate, ProximateOrderDate)
as 
(
    select o.customerid, o.orderid, null, null, o.orderdate, o.orderdate
    from @orders o
    union all   
    select o.customerid, o.orderid, p.orderid, datediff(minute, p.OrderDate, o.OrderDate), o.OrderDate, p.OrderDate
    from ProximityOrderCascade p
    inner join @orders o 
        on p.customerid = o.customerid 
        and abs(datediff(minute, p.OrderDate, o.OrderDate)) between 0 and 60 
        and o.orderid <> p.orderid
    where proximateorderid is null
)
select * from ProximityOrderCascade
where 
    not ProximateOrderId is null

从那里,您可以将结果转换为您选择的查询。此功能的结果仅将客户1和6识别为具有“重复”订单。

CustomerId  OrderId     ProximateOrderId MinutesDifference OrderDate               ProximateOrderDate
----------- ----------- ---------------- ----------------- ----------------------- -----------------------
6           9           10               -1                2012-03-01 04:58:00.000 2012-03-01 04:59:00.000
6           10          9                1                 2012-03-01 04:59:00.000 2012-03-01 04:58:00.000
1           1           3                -5                2012-03-01 01:58:00.000 2012-03-01 02:03:00.000
1           2           3                -2                2012-03-01 02:01:00.000 2012-03-01 02:03:00.000
1           1           2                -3                2012-03-01 01:58:00.000 2012-03-01 02:01:00.000
1           3           2                2                 2012-03-01 02:03:00.000 2012-03-01 02:01:00.000
1           2           1                3                 2012-03-01 02:01:00.000 2012-03-01 01:58:00.000
1           3           1                5                 2012-03-01 02:03:00.000 2012-03-01 01:58:00.000

(8 row(s) affected)