Question

首先我要说的是，我（作为Newb）确实搜索了几个Q＆amp;关于表格中的重复但不幸的是，我无法操纵用作答案的代码。

我的表格是在SQL Server 2008中排序的报告中生成的。

我想知道如何删除重复记录并附上说明。

"MyTable":

Column1   (PK-auto incremental table's record ID) 
Column2   (some TXT) 
Column3   (Some TXT)
Column4   (SmallDateTime)
Column5   is empty

Column5将保留SUM(count of deleted duplicates including this survived row)

的值

可能情况下解决方案的关键是如果[column2 and column3]具有多个具有相同内容的记录（因此是重复项），则它们并不总是共享相同的日期（column4）。

由此：

col1  col2   col3  col4         col5
----  -----  ----  -----------  ----
1     [abc]  [4]   [10/1/2012]  null
2     [abc]  [1]   [12/1/2012]  null
3     [ghi]  [6]   [4/1/2012]   null
4     [def]  [5]   [8/1/2012]   null
5     [abc]  [4]   [10/1/2012]  null
6     [def]  [5]   [12/1/2012]  null
7     [ghi]  [6]   [15/1/2012]  null
8     [abc]  [4]   [17/1/2012]  null
9     [ghi]  [6]   [6/1/2012]   null
10    [abc]  [1]   [13/1/2012]  null

进入这个：

col1  col2   col3  col4         col5
----  -----  ----  -----------  ----
8     [abc]  [4]   [17/1/2012]  2
10    [abc]  [1]   [13/1/2012]  3
6     [def]  [5]   [12/1/2012]  2
7     [ghi]  [6]   [15/1/2012]  3

含义将最新的（1）留作每个重复记录的表示。

++ ++重新编辑

亚伦伯特兰德 shawnt00 e2nburner ......以及你们的其他人我不能说我多么感谢你的回复，虽然我还没理解那么大量的代码。我现在要检查那些代码，但不是b4感谢你们！

当我第一次开始编程并使用

后需要sql查询

Select * From MyTable

...我的第一个SQL声明......

我说我知道SQL！ ....现在......看看你们那些深刻的知识......感谢很多我知道StackOverFlow中的这篇文章对其他初学者来说也会更有用

Answer 1

此答案使用common table expression将row_number()和count（）应用于每个“切片”数据（意味着按col2 + col3分组）。 count（）用于标识每个这样的组有多少行，row_number（）用于应用col4 desc排序的“rank”（1 =每组最新，2 =最新的第二等）。这也使用col1（看起来像一个独特的列）来打破任何关系。 CTE后面可以跟一个查询，例如选择，更新，删除等。因此，您可以运行第一个选择来验证这些是您要保留的行，并且计数是正确的。如果是，则可以继续进行更新和删除。您会注意到，在所有情况下，row_number（）输出用于标识您保留的行或您丢弃的行。

识别要保留的行：

;WITH n AS 
(
  SELECT col1, col2, col3, col4, 
    c = COUNT(*) OVER (PARTITION BY col2, col3),
    rn = ROW_NUMBER() OVER 
    (
      PARTITION BY col2, col3 ORDER BY col4 DESC, col1 DESC
    )
  FROM dbo.table_name
)
SELECT col1, col2, col3, col4, c
  FROM n WHERE rn = 1;

一旦您确认这些是您要保留的行，您可以像这样更新它们：

;WITH n AS 
(
  SELECT col1, col2, col3, col4, col5, 
    c = COUNT(*) OVER (PARTITION BY col2, col3),
    rn = ROW_NUMBER() OVER 
    (
      PARTITION BY col2, col3 ORDER BY col4 DESC, col1 DESC
    )
  FROM dbo.table_name
)
UPDATE n SET col5 = c
  WHERE rn = 1;

然后以这种方式删除余数：

;WITH n AS 
(
  SELECT col1, col2, col3, col4, 
    rn = ROW_NUMBER() OVER 
    (
      PARTITION BY col2, col3 ORDER BY col4 DESC, col1 DESC
    )
  FROM dbo.table_name
)
DELETE n WHERE rn > 1;

或者甚至更简单（假设col5在更新之前完全为空）：

DELETE dbo.table_name WHERE col5 IS NULL;

Answer 2

这是一种简单的方法。您可能会发现merge更好。这些版本保留最高col1值并修改maxdate列。 Aaron用maxdate保留了这一行。这是一个区别我怀疑是重要但应该注意。

update MyTable
set col4 = (
    select max(col4)
    from MyTable as m2
    where m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
),  col5 = (
    select count(*)
    from MyTable as m2
    where m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
)
where not exists (
    select *
    from MyTable as m2
    where
        m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
        and m2.col1 > MyTable.col1
        and m2.col4 > MyTable.col4 or m2.col4 = MyTable.col4 and m2.col1 > MyTable.col1
);

delete from MyTable
where exists (
    select *
    from MyTable as m2
    where
        m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
        and m2.col1 > MyTable.col1
);

编辑2 以下是merge查询的镜头

merge MyTable as target
using (
    select max(col1), col2, col3, max(col4), count(*)
    from Mytable
    group by col2, col3
) as source(id, col2, col3, maxdate, rowcount)
on (
        target.col1 = source.col1
    and target.col2 = target.col2
    and target.col3 = target.col3
)
when matched then
    update set col4 = maxdate, col5 = rowcount
when not matched then delete

编辑3 使用原始maxdate保留行，断开col1上的关系

-- option #1
update MyTable
set col5 = (
    select count(*)
    from MyTable as m2
    where m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
)
where not exists (
    select *
    from MyTable as m2
    where
        m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
        and m2.col4 > MyTable.col4 or m2.col4 = MyTable.col4 and m2.col1 > MyTable.col1
);

delete from MyTable
where exists (
    select *
    from MyTable as m2
    where
        m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
        and m2.col4 > MyTable.col4 or m2.col4 = MyTable.col4 and m2.col1 > MyTable.col1
);

-- option #2
merge MyTable as target
using (
    select max(col1), col2, col3, max(col4), count(*)
    from Mytable
    group by col2, col3
) as source(maxid, col2, col3, maxdate, rowcount)
on (
        target.col2 = target.col2
    and target.col3 = target.col3
    and target.col1 = maxid
    and target.col4 = maxdate
)
when matched then
    update set col5 = rowcount
when not matched then delete

Answer 3

WITH a AS (
    SELECT  *,
            ROW_NUMBER() OVER (PARTITION BY colum2 ORDER BY colum3 desc) RowNum
    FROM    mytable
)
-- deleted rows will be:

delete from mytable
where [yourID] in

(SELECT [yourID]

FROM    a
WHERE   a.RowNum <> 1 )

删除重复最简单的方法/解释

3 个答案: