SQL Server - 在表中查找重复项

时间:2011-09-28 09:01:43

标签: sql sql-server

我们有大约100多万行的大桌子。有人可以帮助如何在表中找到重复数据,并可能将其移动到ARCHIVE

表名:CustomerData
NumberofFields:10

最新的一个应该保留(在该记录中由END_DATE标识为NULL)

此致

2 个答案:

答案 0 :(得分:3)

您只需要移动END_DATE不为NULL的行吗?

在单笔交易中:

INSERT INTO archive (column1, column2, ... column10)
SELECT column1, column2, ..., column10
FROM CustomerData
WHERE END_DATE IS NOT NULL

DELETE CustomerData
WHERE END_DATE IS NOT NULL

答案 1 :(得分:0)

假设CustomerData表结构为: CustomerDAta(cust_id,Cust_name,Address_ID,start_time,End_Date,.....,其他7列);

假设有2个客户拥有SAme地址ID以获得重复项。

要插入存档表: -

INSERT INTO archive (column1, column2, ... column10)
SELECT cust_id, start_Date, ...,End_Date
FROM CustomerData
WHERE END_DATE IS NOT NULL 
AND Address_ID IN(
        SELECT Address_ID FROM
            (
            SELECT Address(ID),count(Address_ID)
            FROM customerDAta
            GROUP BY Address_ID
            HAVING count(Adddress_ID)>1
            )
        )                       
                        )

要删除: - CustomerDAt表: -

DELETE CustomerData
WHERE END_DATE IS NOT NULL
    AND
    Address_ID IN(
            SELECT Address_ID FROM
            (
            SELECT Address(ID),count(Address_ID)
            FROM customerDAta
            GROUP BY Address_ID
            HAVING count(Adddress_ID)>1
            )
        )

INNER SubQuery提取相同的Address_ID列上的重复项,类似于oracle数据库提供的employees表中的DeptID列。