识别并删除重复项

时间:2015-10-05 05:39:24

标签: sql-server sql-server-2008-r2 sql-delete

由于应用程序代码不正确,我必须清理以重复项结尾的数据库。

为了获得必要的数据,我正在加入包含测验用户,问题和答案的表格。这给了我:

UserId | QuestionId | AnswerId | ChoiceId | LastUpdated             | MaxAnswers
--------------------------------------------------------------------------------
17     | 17         | 374526   | 65       | 2014-01-21 16:08:00.057 | 3
17     | 17         | 3497     | 61       | NULL                    | 3
17     | 17         | 3498     | 69       | NULL                    | 3
17     | 17         | 3499     | 70       | NULL                    | 3
17     | 17         | 3500     | 72       | NULL                    | 3
17     | 17         | 4071     | 62       | NULL                    | 3
17     | 17         | 4072     | 63       | NULL                    | 3
17     | 17         | 258050   | 64       | NULL                    | 3
17     | 43         | 4059     | 210      | NULL                    | 1
17     | 43         | 4060     | 210      | NULL                    | 1
17     | 110        | 533242   | 12       | NULL                    | 2
17     | 110        | 536466   | 12       | NULL                    | 2
17     | 110        | 577857   | 12       | 2015-09-24 09:13:15.127 | 2

我必须为Question每个User保留前X个答案,其中XMaxAnswer,按LastUpdated DESC排序? AnswerID DESC,并删除其余内容 - 除非ChoiceId多次出现,在这种情况下只保留其中一个ChoiceId。 对于给定的QuestionIdMaxAnswer始终是相同的。

我目前有上面的选择(注意:在上面的数据样本中我有AnswerId ASC,它已经被纠正了)但是我不知道我怎么去(我假设使用partition?)从那里

编辑:此样本的预期输出为:

UserId | QuestionId | AnswerId | ChoiceId | LastUpdated             | MaxAnswers
--------------------------------------------------------------------------------
17     | 17         | 374526   | 65       | 2014-01-21 16:08:00.057 | 3
17     | 17         | 258050   | 64       | NULL                    | 3
17     | 17         | 4072     | 63       | NULL                    | 3
17     | 43         | 4060     | 210      | NULL                    | 1
17     | 110        | 577857   | 12       | 2015-09-24 09:13:15.127 | 2

1 个答案:

答案 0 :(得分:3)

请尝试以下代码

;with cte as (
    select
        *,
        rn = row_number() over (partition by UserId, QuestionId order by LastUpdated desc, AnswerId desc)
    from UserAnswers
)
delete UserAnswers
from UserAnswers u
inner join cte 
    on  u.UserId = cte.UserId and
        u.QuestionId = cte.QuestionId and
        u.AnswerId = cte.AnswerId
where cte.rn > cte.MaxAnswers

您还可以参考以下SQL教程SQL Row_Number() function is used to delete duplicate rows

这是测试

create table UserAnswers (
UserId int, QuestionId int,  AnswerId int,  ChoiceId int,  LastUpdated datetime, MaxAnswers int
)
insert into UserAnswers select 17     , 17         , 374526   , 65       , '2014-01-21 16:08:00.057' ,   3
insert into UserAnswers select 17     , 17         , 3497     , 61       , NULL        , 3
insert into UserAnswers select 17     , 17         , 3498     , 69       , NULL        , 3
insert into UserAnswers select 17     , 17         , 3499     , 70       , NULL        , 3
insert into UserAnswers select 17     , 17         , 3500     , 72       , NULL        , 3
insert into UserAnswers select 17     , 17         , 4071     , 62       , NULL        , 3
insert into UserAnswers select 17     , 17         , 4072     , 63       , NULL        , 3
insert into UserAnswers select 17     , 17         , 258050   , 64       , NULL        , 3
insert into UserAnswers select 17     , 43         , 4059     , 210      , NULL        , 1
insert into UserAnswers select 17     , 43         , 4060     , 210      , NULL        , 1
insert into UserAnswers select 17     , 110        , 533242   , 12       , '2015-09-24 09:13:15.127' ,   2