基于日期的重复组

时间:2014-02-04 11:19:19

标签: asp.net sql sql-server sql-server-2008 tsql

我有以下查询根据RegNumber列值检测重复项,如果不同的行输入日期相差不到10分钟,查询将保留具有最高Confidence列值的那个。

SELECT *, 
       CASE 
         WHEN conf_max = confidence THEN 'Conf_Max' 
         ELSE 'Duplicate' 
       END AS Is_Conf_Max 
FROM   (SELECT *, 
               Max(confidence) 
                 OVER ( 
                   partition BY regnumber) AS Conf_Max 
        FROM   (SELECT id, 
                       cameraid, 
                       dateseen, 
                       nationality, 
                       regnumber, 
                       confidence, 
                       Min(dateseen) 
                         OVER ( 
                           partition BY regnumber) AS DateSeen_Min, 
                       Max(dateseen) 
                         OVER ( 
                           partition BY regnumber) AS DateSeen_Max 
                FROM   plate_read 
                WHERE  ( cameraid IN ( 5, 6 ) )) A 
        WHERE  Abs(Datediff(minute, dateseen_max, dateseen_min)) <= 10) B 
WHERE  conf_max <> confidence 
ORDER  BY regnumber 

但是问题如下:这给了我所有重复项,其中DateSeen列的差异小于10分钟。但是,如果我有另一组重复项超过10分钟且具有相同的RegNumber,则不会检测到这些示例如下:

ID    CamId     DateSeen                 Nationality   Reg      Conf
--    -----     -------                 ----------     ---      ---
80      5    20/12/2013 12:10:57           E         5897HHS     94
81      5    20/12/2013 12:15:03           E         5897HHS     93
82      5    20/12/2013 12:16:17          GBZ        G6746D      98
83      5    20/12/2013 12:35:57           E         5897HHS     88
84      5    20/12/2013 12:36:03           E         5897HHS     86

根据以上数据,只有ID 80,82和83有效,因为81是80的重复,84是83的重复。希望有人可以协助这个吗?

1 个答案:

答案 0 :(得分:0)

这可能不是一个完整的答案,但为什么您的查询需要如此复杂?为什么不简化它(为清楚起见省略了额外的标准):

select *
from plate_read pr1
where conf = (
  select max (conf)
  from plate_read pr2
  where pr1.reg = pr2.reg
  and abs(datediff(minute,pr1.dateseen,pr2.dateseen)) < 11
  )

已保存in SQLfiddle here