我以为我已经想到了这一点,但事实证明我只是删除了第一条记录。以下内容返回重复的行。所有都有2.我只想删除每个重复记录的第一个。
select scorestudentid, scoreadvisor, scorecorrect, count(*)
from scores
where scoretestid = 3284
group by scorestudentid, scoreadvisor, scorecorrect
having count(scorestudentid) > 1
返回:
scorestudentid scoreadvisor scorecorrect no column name
13033719 28059 3.0 2
13033777 28086 3.0 2
13033826 28147 3.0 2
13033960 28023 3.0 2
所以我把它放在一起认为它会起作用:
set rowcount 1
delete
from scores
where scoretestid = 3284
and scorestudentid in (
select scorestudentid
from scores
where scoretestid = 3284
group by scorestudentid
having count(scorestudentid) > 1)
它看起来应该是一个简单的概念,但我没有得到它。
基于Thomas脚本,我更新了查询以适应但仍然无效。
Delete Scores
Where Exists (
Select 1
From Scores As S2
Where S2.ScoreStudentId = Scores.ScoreStudentId
And S2.ScoreAdvisor = Scores.ScoreAdvisor
And S2.ScoreCorrect = Scores.ScoreCorrect
Group By S2.ScoreStudentId, S2.ScoreAdvisor, S2.ScoreCorrect
Having Count(*) > 1
And Min(S2.NewScoreID) = Scores.NewScoreID
)
And Scores.ScoreTestId = 3284
答案 0 :(得分:5)
诀窍是使用主键列(你有一个,正确吗?)并只是找到符合你想要的标准的第一个PK值。如果由于某些疯狂的原因您没有主键列,则添加一个Identity列并将其作为主键,然后执行删除。
编辑修改以使其更通用。如果您删除ScoreTest上的最终过滤器,它将根据ScoreStudentId,ScoreAdvisor和ScoreCorrect删除所有重复项。
Delete Scores
Where Exists (
Select 1
From Scores As S2
Where S2.ScoreStudentId = Scores.ScoresStudentId
And S2.ScoreAdvisor = Scores.ScoreAdvisor
And S2.ScoreCorrect = Scores.ScoreCorrect
Group By S2.ScoreStudentId, S2.ScoreAdvisor, S2.ScoreCorrect
Having Count(*) > 1
And Min(S2.PrimaryKeyColumn) = Scores.PrimaryKeyColumn
)
And Scores.ScoreTest = 3284
答案 1 :(得分:0)
我相信Thomas的解决方案不适用于主键的uniqueidentifier。此外,如果一个记录在表格中多次重复(即3,4,5次),则只会删除一个。
这就是我们使用的:
声明@ col1 uniqueidentifier 声明@col2 varchar(256) 声明@col3 datetime
DECLARE C CURSOR
FOR
select col1, col2, col3
from MyTable
where IsDeleted = 0
group by col1, col2, col3
having count(*) > 1
OPEN C
FETCH NEXT FROM C
INTO @col1, @col2, @col3
WHILE @@FETCH_STATUS = 0
BEGIN
declare @primaryKey uniqueidentifier
set @primaryKey = (select top 1 primaryKey from MyTable
where col1 = @col1 and col2= @col2 and col3 = @col3)
update MyTable
set IsDeleted = 1, DeleteDt = GETDATE()
where col1 = @col1
and col2 = @col2
and col3 = @col3
and PrimaryKey<> @primaryKey
FETCH NEXT FROM C
INTO @col1, @col2, @col3
END
CLOSE C
DEALLOCATE C
这个光标的作用是:
答案 2 :(得分:0)
我将在SQL世界中讨论一个有趣的话题。如果你谷歌这个主题,你会发现从表中删除重复数据的多种方法。我不会写一些非常新的内容但是我会在使用传统方法删除重复数据时讨论性能问题。
从SQL 2000中删除重复的行: - 我创建了一个表DuplicateData,并根据EmpId插入了几个重复的行。
创建表DuplicateData(EmpId int,Name varchar(100)) - &gt;表创建
insert into DuplicateData values(4,'Akshay')
insert into DuplicateData values(4,'Akshay')
insert into DuplicateData values(5,'ankit')
insert into DuplicateData values(3,'Vikas')
insert into DuplicateData values(3,'Vikas')
insert into DuplicateData values(3,'Vikas')
insert into DuplicateData values(3,'Vikas')
insert into DuplicateData values(2,'Raj')
insert into DuplicateData values(2,'Raj')
insert into DuplicateData values(1,'Neeraj')
insert into DuplicateData values(1,'Neeraj')
insert into DuplicateData values(1,'Neeraj')
在SQL 2000中从表中删除重复行的传统方法: - 如果我们在查询分析器中运行以下批处理,它将从表DuplicateData中删除所有重复值。如果您在测试环境中或在虚拟数据上执行此查询,则此查询为“OK”。但是,如果您有数百万条记录或大数据,则此查询在性能方面将是最糟糕的查询。可能需要几个小时或几天,具体取决于预期表格中的数据量。
原因: - 查询下面是一个相关的子查询,它将对表中存在的每个EmpId执行,并检查每个EmpId的计数是否> 1然后逐个删除每个记录。这就是它性能下降的原因。
set rowcount 1
delete from DuplicateData where (select count(EmpId) from DuplicateData a where a.EmpId=DuplicateData.EmpId)>1
while @@rowcount>0
delete from DuplicateData where (select count(EmpId) from DuplicateData a where a.EmpId=DuplicateData.EmpId)>1
set rowcount 0
我们可以创建一个存储过程来克服这个性能问题。以下是示例。
declare @tmp table(empid int,cnt int, rowid int identity)--> declare table variable
declare @maxcounter as integer--> Declaration of variables
declare @mincounter as integer
declare @rowcnt as integer
declare @empid as int-->End of Declaration
insert into @tmp(empid,cnt)-->Inserting duplicate empid along with no of duplicate entries
select empid,count(empid) from duplicatedata
group by empid having count(empid)>1
select @mincounter=min(rowid),@maxcounter=max(rowid) from @tmp -->assigning minimum and maximum rowid to variables.
while @mincounter <=@maxcounter
begin
select @rowcnt=cnt,@empid=empid from @tmp where rowid=@mincounter
set @rowcnt =@rowcnt-1
set rowcount @rowcnt
delete from duplicatedata where empid=@empid
set rowcount 0
set @mincounter=@mincounter +1
end
让我们理解上面的while循环,我们在@tmp表中有所有重复记录,没有重复的条目。现在我们将循环遍历@tmp表中的每条记录,因此我们已经为变量分配了最小和最大rowid(@maxcounter,@ mincounter)。
在While循环体中,我们将“no of duplicate records”值分配给变量@rowcnt并将empid分配给变量@empid
在我们设置@ rowcnt = @ rowcnt-1的下一个语句中,我们这样做是因为此变量不包含特定empid的重复记录,但是我们希望保留一个empid与重复的记录。 在下一个语句中,我们设置的rowcount的值小于该特定empid的重复记录的值。
Next语句将rowcount重置为0,last语句增加@mincounter值以从@tmp表中获取下一条记录。