删除具有最早日期的重复行

时间:2017-06-23 21:31:12

标签: sql-server tsql

我有一个名为renderer: function(value, metaData, record, rowIndex, colIndex, store, view) { if(!value) return ''; var combo = this.columns[colIndex].getEditor(); var combostore = combo.getStore(); var dataIndex = combostore.findExact('id', value); var recordCombo = combostore.getAt(dataIndex); return recordCombo.get('name'); }, 的表,它具有以下结构:

  • 姓名
  • 中间名
  • 名字
  • DOB
  • 地址
  • 城市
  • 状态
  • 电话
  • validitydate

除了有效日期外,它有许多相同的行。例如:

PF_temo

我希望运行一个脚本,删除除最后一列(steve,s,smith, 19710909,112 crazy st,miami,fl,3055551212,201609 steve,s,smith, 19710909,112 crazy st,miami,fl,3055551212,201002 steve,s,smith, 19710909,112 crazy st,miami,fl,3055551212,201706 steve,s,smith, 19710909,112 crazy st,miami,fl,3055551212,199812 )之外的所有内容上匹配的所有重复项,只留下表中的下方,这是{{1}的最新有效日期}:

validitydate

这就是我所拥有的;虽然它引发了一个例外:

201706

它不起作用,实际上会弹出这个错误:

steve,s,smith, 19710909,112 crazy st,miami,fl,3055551212,201706

另外,我想根据字母表的最后一个字母分阶段运行它。因此需要在某处添加类似DELETE FROM PF_temp LEFT OUTER JOIN ( SELECT Min(ValidityDate) as RowId , firstname , middlename , lastname , DOB , address , city , state , phone FROM PF_temp GROUP BY firstname , middlename , lastname , DOB , address , city , state , phone , validitydate ) as KeepRows ON TableName.RowId = KeepRows.RowId WHERE KeepRows.RowId IS NULL 的内容。

1 个答案:

答案 0 :(得分:1)

试试这个:

delete a
from PF_Temp a
inner join PF_Temp b 
on  b.firstname = a.firstname 
and b.middlename = a.middlename
and b.lastname = a.lastname
and b.DOB = a.DOB
and b.address = a.address
and b.city = a.city
and b.state = a.state
and b.phone = a.phone
and b.validitydate > a.validitydate

SQL Fiddle处的示例。

以上作品:

  • 加入所有匹配字段(有效日期除外),从而在a中捕获所有具有重复项的记录。在此阶段,我们会捕获所有记录,因为a中的记录会在b中与自身匹配。
  • 通过指定validitydate中的b必须大于a中的a,我们都避免上述记录相同的问题(因为如果它是相同的记录,有效日期是相同的),并且如果b中的记录是最新的,也确保不匹配;因为a中没有匹配(即没有更高有效日期的记录)。
  • 然后我们删除where a.LastName like 'A%'返回的每条记录;即每个记录都有重复,但有效日期较晚。

如果您只想删除具有特定姓氏的重复项,您可以按照上述说法执行操作;即添加行null

<强>更新

您提到某些列可能包含null != null个。以下是上述修订版,以考虑delete a from PF_Temp a inner join PF_Temp b on ((b.firstname = a.firstname) or (b.firstname is null and a.firstname is null)) and ((b.middlename = a.middlename) or (b.middlename is null and a.middlename is null)) and ((b.lastname = a.lastname) or (b.lastname is null and a.lastname is null)) and ((b.DOB = a.DOB) or (b.DOB is null and a.DOB is null)) and ((b.address = a.address) or (b.address is null and a.address is null)) and ((b.city = a.city) or (b.city is null and a.city is null)) and ((b.state = a.state) or (b.state is null and a.state is null)) and ((b.phone = a.phone) or (b.phone is null and a.phone is null)) and b.validitydate > a.validitydate

on coalesce(b.firstname,'') = coalesce(a.firstname)

上述的替代方法是delete TheDeletables from ( select * , row_number() over ( partition by firstname , middlename , lastname , DOB , address , city , state , phone order by validitydate desc ) rowid from PF_Temp ) TheDeletables where rowid > 1; (对所有其他匹配字段重复该模式);虽然这意味着nulls和blanks被视为相同,并且表现不会很好。

替代方法

另一种更宽容空值的方法是使用子查询来回退所有值,使用匹配值对每个集合进行编号,从最近的有效日期开始为1。然后我们删除所有返回的数字大于1的行;即任何具有早期有效日期的重复文件。

{{1}}

演示SQL Fiddle