我想在SQL 2008或VB Excel中编写一个脚本,然后在每个副本上获取所有重复记录,它将获得最低的id,然后使用该id填充ID_TO_KEEP字段。
原始数据:
ID COMPANY_NAME ADDRESS ZIP CODE ID TO KEEP
111 HONDA MOTORS 55 Oklahoma City 4301
143 HONDA LTD. 55 Oklahoma City 4301
1321 HONDA CARS 55 Oklahoma City 4301
231 MITSUBISHI 32 Miami 5532
342 MITSUBASHA 28 Miami 9421
1324 MERCEDES BENZ 21 Toronto 4210
3212 MERCEDES CARS 21 Toronto 4210
432 MERCEDES ELECTRIC 24 Orlando 7732
我想要发生的事情:
ID COMPANY_NAME ADDRESS ZIP CODE ID TO KEEP
111 HONDA MOTORS 55 Oklahoma City 4301 111
143 HONDA LTD. 55 Oklahoma City 4301 111
1321 HONDA CARS 55 Oklahoma City 4301 111
231 MITSUBISHI 32 Miami 5532
342 MITSUBASHA 28 Miami 9421
1324 MERCEDES BENZ 21 Toronto 4210 1324
3212 MERCEDES CARS 21 Toronto 4210 1324
432 MERCEDES ELECTRIC 24 Orlando 7732
保留列的ID已填充,因为这3家本田公司被认为是相同的,因为他们在同一地址和邮政编码,然后在这3个本田中,111是最低的ID,所以它是用于填充ID的那个保留这3家公司的专栏。
在梅赛德斯奔驰的情况下,尽管梅赛德斯电气拥有相同的名字,但在上面的2梅赛德斯公司上仍然没有被认为是相同的,因为它有不同的地址和邮政编码。
希望有人可以帮助我
答案 0 :(得分:1)
如果您使用的是SQL-Server,则可以使用可更新的Common表表达式来执行此操作
WITH CTE AS
( SELECT ID,
IDTOKEEP,
MinID = MIN(ID) OVER(PARTITION BY ZIPCODE, ADDRESS),
[Count] = COUNT(ID) OVER(PARTITION BY ZIPCODE, ADDRESS)
FROM T
)
UPDATE CTE
SET IDTOKEEP = MinID
WHERE [Count] > 1;
第一步是使用分析函数使用地址/邮政编码组合确定每行的最小ID,并确定是否存在多个具有该组合的行。然后更新标识为重复的行:
<强> Example on SQL Fiddle 强>
答案 1 :(得分:0)
您可以通过几个步骤完成:
--find how many rows do you have for every adress
SELECT adress, COUNT(*) as cnt
INTO #temp1
FROM table1
GROUP BY adress
HAVING COUNT(*) > 1
SELECT MIN(id) AS minID, t.adress
INTO #temp2
FROM table1 AS t
INNER JOIN #temp1 AS T2 ON t.adress = t2.adress
Group by t.adress;
update t
set minID = t2.minID
--select *
from table1 as t
inner join #temp2 as t2 on t.adress = t2.adress
答案 2 :(得分:0)
您可以使用窗口函数在SQL Server中获取结果:
--Generate a Temporary Table an populate it
DECLARE @tmp TABLE ([ID] DECIMAL(28),[COMPANY_NAME] varchar(50),[ADDRESS] VARCHAR(50),[ZIP CODE] VARCHAR(50))
INSERT INTO @tmp (ID, [COMPANY_NAME],[ADDRESS],[ZIP CODE])
SELECT 111, 'HONDA MOTORS', '55 Oklahoma City', '4301'
UNION ALL SELECT 143, 'HONDA LTD.', '55 Oklahoma City', '4301'
UNION ALL SELECT 1321, 'HONDA CARS', '55 Oklahoma City', '4301'
UNION ALL SELECT 231, 'MITSUBISHI', '32 Miami', '5532'
UNION ALL SELECT 342, 'MITSUBASHA', '28 Miami', '9421'
UNION ALL SELECT 1324, 'MERCEDES BENZ', '21 Toronto', '4210'
UNION ALL SELECT 3212, 'MERCEDES CARS', '21 Toronto', '4210'
UNION ALL SELECT 432, 'MERCEDES ELECTRIC', '24 Orlando', '7732'
--now get the first id, partitioned by ZIP Code and Address.
select *, min(ID) OVER(PARTITION BY [ADDRESS], [ZIP CODE]) as IDTOKEEP from @tmp
如果您还需要评估公司,并且第一个空格字符完成了这项工作,您可以添加
CASE sign(charindex(' ', COMPANY_NAME)) WHEN 1 THEN left(COMPANY_NAME, charindex(' ', COMPANY_NAME) - 1) ELSE COMPANY_NAME END
条款PARTITION BY
。