获取重复记录并识别最低ID,然后使用最低ID填充Id_to_keep字段

时间:2014-08-13 07:36:31

标签: sql sql-server excel excel-vba vba

我想在SQL 2008或VB Excel中编写一个脚本,然后在每个副本上获取所有重复记录,它将获得最低的id,然后使用该id填充ID_TO_KEEP字段。

原始数据:

ID      COMPANY_NAME       ADDRESS           ZIP CODE    ID TO KEEP

111     HONDA MOTORS       55 Oklahoma City    4301
143     HONDA LTD.         55 Oklahoma City    4301
1321    HONDA CARS         55 Oklahoma City    4301
231     MITSUBISHI         32 Miami            5532
342     MITSUBASHA         28 Miami            9421
1324    MERCEDES BENZ      21 Toronto          4210
3212    MERCEDES CARS      21 Toronto          4210
432     MERCEDES ELECTRIC  24 Orlando          7732

我想要发生的事情:

    ID      COMPANY_NAME       ADDRESS           ZIP CODE    ID TO KEEP

    111     HONDA MOTORS       55 Oklahoma City    4301         111
    143     HONDA LTD.         55 Oklahoma City    4301         111
    1321    HONDA CARS         55 Oklahoma City    4301         111
    231     MITSUBISHI         32 Miami            5532
    342     MITSUBASHA         28 Miami            9421
    1324    MERCEDES BENZ      21 Toronto          4210         1324
    3212    MERCEDES CARS      21 Toronto          4210         1324
    432     MERCEDES ELECTRIC  24 Orlando          7732

保留列的ID已填充,因为这3家本田公司被认为是相同的,因为他们在同一地址和邮政编码,然后在这3个本田中,111是最低的ID,所以它是用于填充ID的那个保留这3家公司的专栏。

在梅赛德斯奔驰的情况下,尽管梅赛德斯电气拥有相同的名字,但在上面的2梅赛德斯公司上仍然没有被认为是相同的,因为它有不同的地址和邮政编码。

希望有人可以帮助我

3 个答案:

答案 0 :(得分:1)

如果您使用的是SQL-Server,则可以使用可更新的Common表表达式来执行此操作

WITH CTE AS
(   SELECT  ID, 
            IDTOKEEP,
            MinID = MIN(ID) OVER(PARTITION BY ZIPCODE, ADDRESS),
            [Count] = COUNT(ID) OVER(PARTITION BY ZIPCODE, ADDRESS)
    FROM    T
)
UPDATE  CTE
SET     IDTOKEEP = MinID
WHERE   [Count] > 1;

第一步是使用分析函数使用地址/邮政编码组合确定每行的最小ID,并确定是否存在多个具有该组合的行。然后更新标识为重复的行:

<强> Example on SQL Fiddle

答案 1 :(得分:0)

您可以通过几个步骤完成:

    --find how many rows do you have for every adress
SELECT adress, COUNT(*)  as cnt
INTO #temp1
FROM table1
GROUP BY adress
HAVING COUNT(*) > 1 


SELECT MIN(id) AS minID, t.adress
INTO #temp2
FROM table1 AS t
INNER JOIN #temp1 AS T2 ON t.adress = t2.adress 
Group by t.adress;


update t
set minID = t2.minID
--select *
from table1 as t
inner join #temp2 as t2 on t.adress = t2.adress

检查一下: http://rextester.com/VKWVI82297

答案 2 :(得分:0)

您可以使用窗口函数在SQL Server中获取结果:
--Generate a Temporary Table an populate it DECLARE @tmp TABLE ([ID] DECIMAL(28),[COMPANY_NAME] varchar(50),[ADDRESS] VARCHAR(50),[ZIP CODE] VARCHAR(50)) INSERT INTO @tmp (ID, [COMPANY_NAME],[ADDRESS],[ZIP CODE]) SELECT 111, 'HONDA MOTORS', '55 Oklahoma City', '4301' UNION ALL SELECT 143, 'HONDA LTD.', '55 Oklahoma City', '4301' UNION ALL SELECT 1321, 'HONDA CARS', '55 Oklahoma City', '4301' UNION ALL SELECT 231, 'MITSUBISHI', '32 Miami', '5532' UNION ALL SELECT 342, 'MITSUBASHA', '28 Miami', '9421' UNION ALL SELECT 1324, 'MERCEDES BENZ', '21 Toronto', '4210' UNION ALL SELECT 3212, 'MERCEDES CARS', '21 Toronto', '4210' UNION ALL SELECT 432, 'MERCEDES ELECTRIC', '24 Orlando', '7732' --now get the first id, partitioned by ZIP Code and Address. select *, min(ID) OVER(PARTITION BY [ADDRESS], [ZIP CODE]) as IDTOKEEP from @tmp

如果您还需要评估公司,并且第一个空格字符完成了这项工作,您可以添加 CASE sign(charindex(' ', COMPANY_NAME)) WHEN 1 THEN left(COMPANY_NAME, charindex(' ', COMPANY_NAME) - 1) ELSE COMPANY_NAME END条款PARTITION BY