从sql数据库中删除重复的行。只使用一些列

时间:2013-11-07 08:30:59

标签: sql sql-server-2008 duplicates

我想从我的数据库中删除共享一些相同列但不是所有列的重复行。 到目前为止我的代码:

SELECT sfb_id, prs_id_201304, prs_id_201204, vorname, nachname, sex, gebdat, strasse, hausnummer, ort, plz, beg_dat, end_dat, quelle
INTO #duplicates
FROM [Recordlinkage].[dbo].[2012]
GROUP BY vorname, nachname, hausnummer, ort, plz
HAVING COUNT(*) > 1

-- delete all rows that are duplicated
DELETE FROM [Recordlinkage].[dbo].[2012]
FROM [Recordlinkage].[dbo].[2012] o INNER JOIN #duplicates d
ON d.vorname= o.vorname and d.nachname=o.nachname and d.hausnummer=o.hausnummer and d.ort=o.ort and d.plz=o.plz

INSERT INTO [Recordlinkage].[dbo].[2012] (vorname, nachname, hausnummer, ort, plz)
SELECT sfb_id, prs_id_201304, prs_id_201204, vorname, nachname, sex, gebdat, strasse, hausnummer, ort, plz, beg_dat, end_dat, quelle
FROM #duplicates

我只想删除在vorname,nachname,hausnummer,ort和plz中相同的重复项。我测试它只选择这个子变量集。它工作正常,但我错过了之前未选择的所有其他列中的信息。

2 个答案:

答案 0 :(得分:1)

这样的事情:你可以试试这个

WITH CTE AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY columnname ORDER BY columnname DESC) FROM tablename 
)
delete from CTE where RN>1
go

答案 1 :(得分:0)

这样的事情:

DELETE
FROM (SELECT vorname, nachname, hausnummer, ort, plz,
           ROW_NUMBER()OVER(PARTITION BY vorname, nachname, hausnummer, ort, plz
                            ORDER BY sfb_id) as rnk
           FROM [Recordlinkage].[dbo].[2012]) a
WHERE a.rnk > 1