首先我要说的是,我(作为Newb)确实搜索了几个Q&关于表格中的重复但不幸的是,我无法操纵用作答案的代码。
我的表格是在SQL Server 2008中排序的报告中生成的。
我想知道如何删除重复记录并附上说明。
"MyTable":
Column1 (PK-auto incremental table's record ID)
Column2 (some TXT)
Column3 (Some TXT)
Column4 (SmallDateTime)
Column5 is empty
Column5将保留SUM(count of deleted duplicates including this survived row)
可能情况下解决方案的关键是如果[column2 and column3]
具有多个具有相同内容的记录(因此是重复项),则它们并不总是共享相同的日期(column4
)。
由此:
col1 col2 col3 col4 col5
---- ----- ---- ----------- ----
1 [abc] [4] [10/1/2012] null
2 [abc] [1] [12/1/2012] null
3 [ghi] [6] [4/1/2012] null
4 [def] [5] [8/1/2012] null
5 [abc] [4] [10/1/2012] null
6 [def] [5] [12/1/2012] null
7 [ghi] [6] [15/1/2012] null
8 [abc] [4] [17/1/2012] null
9 [ghi] [6] [6/1/2012] null
10 [abc] [1] [13/1/2012] null
进入这个:
col1 col2 col3 col4 col5
---- ----- ---- ----------- ----
8 [abc] [4] [17/1/2012] 2
10 [abc] [1] [13/1/2012] 3
6 [def] [5] [12/1/2012] 2
7 [ghi] [6] [15/1/2012] 3
含义将最新的(1)留作每个重复记录的表示。
++ ++重新编辑
亚伦伯特兰德 shawnt00 e2nburner ......以及你们的其他人 我不能说我多么感谢你的回复,虽然我还没理解那么大量的代码。 我现在要检查那些代码,但不是b4感谢你们!当我第一次开始编程并使用
后需要sql查询Select * From MyTable
...我的第一个SQL声明......
我说我知道SQL! ....现在......看看你们那些深刻的知识......感谢很多我知道StackOverFlow中的这篇文章对其他初学者来说也会更有用答案 0 :(得分:2)
此答案使用common table expression将row_number()和count()应用于每个“切片”数据(意味着按col2 + col3分组)。 count()用于标识每个这样的组有多少行,row_number()用于应用col4 desc排序的“rank”(1 =每组最新,2 =最新的第二等)。这也使用col1(看起来像一个独特的列)来打破任何关系。 CTE后面可以跟一个查询,例如选择,更新,删除等。因此,您可以运行第一个选择来验证这些是您要保留的行,并且计数是正确的。如果是,则可以继续进行更新和删除。您会注意到,在所有情况下,row_number()输出用于标识您保留的行或您丢弃的行。
识别要保留的行:
;WITH n AS
(
SELECT col1, col2, col3, col4,
c = COUNT(*) OVER (PARTITION BY col2, col3),
rn = ROW_NUMBER() OVER
(
PARTITION BY col2, col3 ORDER BY col4 DESC, col1 DESC
)
FROM dbo.table_name
)
SELECT col1, col2, col3, col4, c
FROM n WHERE rn = 1;
一旦您确认这些是您要保留的行,您可以像这样更新它们:
;WITH n AS
(
SELECT col1, col2, col3, col4, col5,
c = COUNT(*) OVER (PARTITION BY col2, col3),
rn = ROW_NUMBER() OVER
(
PARTITION BY col2, col3 ORDER BY col4 DESC, col1 DESC
)
FROM dbo.table_name
)
UPDATE n SET col5 = c
WHERE rn = 1;
然后以这种方式删除余数:
;WITH n AS
(
SELECT col1, col2, col3, col4,
rn = ROW_NUMBER() OVER
(
PARTITION BY col2, col3 ORDER BY col4 DESC, col1 DESC
)
FROM dbo.table_name
)
DELETE n WHERE rn > 1;
或者甚至更简单(假设col5在更新之前完全为空):
DELETE dbo.table_name WHERE col5 IS NULL;
答案 1 :(得分:1)
这是一种简单的方法。您可能会发现merge
更好。这些版本保留最高col1值并修改maxdate列。 Aaron用maxdate保留了这一行。这是一个区别我怀疑是重要但应该注意。
update MyTable
set col4 = (
select max(col4)
from MyTable as m2
where m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
), col5 = (
select count(*)
from MyTable as m2
where m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
)
where not exists (
select *
from MyTable as m2
where
m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
and m2.col1 > MyTable.col1
and m2.col4 > MyTable.col4 or m2.col4 = MyTable.col4 and m2.col1 > MyTable.col1
);
delete from MyTable
where exists (
select *
from MyTable as m2
where
m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
and m2.col1 > MyTable.col1
);
编辑2 以下是merge
查询的镜头
merge MyTable as target
using (
select max(col1), col2, col3, max(col4), count(*)
from Mytable
group by col2, col3
) as source(id, col2, col3, maxdate, rowcount)
on (
target.col1 = source.col1
and target.col2 = target.col2
and target.col3 = target.col3
)
when matched then
update set col4 = maxdate, col5 = rowcount
when not matched then delete
编辑3 使用原始maxdate保留行,断开col1上的关系
-- option #1
update MyTable
set col5 = (
select count(*)
from MyTable as m2
where m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
)
where not exists (
select *
from MyTable as m2
where
m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
and m2.col4 > MyTable.col4 or m2.col4 = MyTable.col4 and m2.col1 > MyTable.col1
);
delete from MyTable
where exists (
select *
from MyTable as m2
where
m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
and m2.col4 > MyTable.col4 or m2.col4 = MyTable.col4 and m2.col1 > MyTable.col1
);
-- option #2
merge MyTable as target
using (
select max(col1), col2, col3, max(col4), count(*)
from Mytable
group by col2, col3
) as source(maxid, col2, col3, maxdate, rowcount)
on (
target.col2 = target.col2
and target.col3 = target.col3
and target.col1 = maxid
and target.col4 = maxdate
)
when matched then
update set col5 = rowcount
when not matched then delete
答案 2 :(得分:0)
WITH a AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY colum2 ORDER BY colum3 desc) RowNum
FROM mytable
)
-- deleted rows will be:
delete from mytable
where [yourID] in
(SELECT [yourID]
FROM a
WHERE a.RowNum <> 1 )