TSQL保留有效的重复项并删除无效的重复项

时间:2013-09-24 09:36:34

标签: sql sql-server tsql sql-server-2012 duplicate-removal

我一直在反对这一段时间,现在已经无处可去;数据必须保持在行级别。

我希望保留最早到达的数据,重复数据有效。 Load1表示batchID。并非所有值都有重复

我想要回归

Code1   Code2   Code3   Load1   LoadTime
a1      a1      a1      1       2013-09-10
a1      a1      a1      1       2013-09-10
a1      a1      a1      1       2013-09-10
a2      a1      a1      2       2013-09-12
a1      a2      a1      3       2013-09-13
a1      a2      a1      3       2013-09-13

有什么建议吗?

 CREATE TABLE #Test (
 Code1  varchar(10),
 Code2  varchar(10),
 Code3  varchar(10),
 Load1  varchar(10),
 LoadTime DATE
 )


  INSERT INTO #Test
  VALUES ('a1','a1','a1','1','2013-09-10') --Keep

  INSERT INTO #Test
  VALUES ('a1','a1','a1','1','2013-09-10') --Keep

  INSERT INTO #Test
  VALUES ('a1','a1','a1','1','2013-09-10') --Keep

  INSERT INTO #Test
  VALUES ('a1','a1','a1','2','2013-09-11') --Delete

  INSERT INTO #Test
  VALUES ('a2','a1','a1','2','2013-09-12') --Keep

  INSERT INTO #Test
  VALUES ('a2','a1','a1','3','2013-09-13') --Delete

  INSERT INTO #Test
  VALUES ('a1','a2','a1','3','2013-09-13') --Keep

  INSERT INTO #Test
  VALUES ('a1','a2','a1','3','2013-09-13') --Keep

  INSERT INTO #Test
  VALUES ('a1','a2','a1','4','2013-09-13')-- Delete

  INSERT INTO #Test
  VALUES ('a1','a2','a1','4','2013-09-13')-- Delete

2 个答案:

答案 0 :(得分:0)

您可能希望查看row_number()dense_rank()

很难说出删除或保留样本数据的逻辑,但类似

;with cte as (
      select *, 
      dense_rank() over (partition by code1,code2,code3 order by loadtime) rn 
      from #test)
    delete #Test
    from #Test t
        inner join cte
            on t.Code1 = cte.Code1
            and t.Code2 = cte.Code2
            and t.Code3 = cte.Code3
            and t.Load1 = cte.Load1
            and t.LoadTime = cte.LoadTime
        where rn>1

(如果您的数据具有唯一ID,则连接会更容易)

答案 1 :(得分:0)

您可以使用SQL Server common table expression or CTE

with cte as (
    select
        dense_rank() over(partition by Code1, Code2, Code3 order by LoadTime, Load1 asc) as rn
    from Table1
)
delete from cte where rn > 1

<强> sql fiddle demo

实际上这个查询在SQL Server中非常简单,因为SQL Server将简单的公用表表达式视为可更新视图 - 您不必在原始表上加入cte,只需delete from cte