如何使用SQL中基于标识符列的最常见值替换不正确的字段值?

时间:2016-08-30 20:37:28

标签: sql

我有一列地址和邮政编码与同一个房屋ID不一致,我想根据公共房屋ID用最常用的地址替换错误的地址。

例如,原始表格可能如下所示,我希望将每个前提的街道和邮政编码列保持一致。

Date  |  Premise  |  House_No  |  Street        |  Zip
-----------------------------------------------------------
Jan   |  43219    |   123      |  E Haywood Dr  |  31214
Feb   |  43219    |   123      |  Haywood Dr E  |  31214-3291
Mar   |  43219    |   123      |  E Haywood Dr  |  31214
Apr   |  43219    |   123      |  Haywood Dr E  |  31214-3291
May   |  43219    |   123      |  E Haywood Dr  |  31214
Jan   |  43111    |   456      |  W Simpson Wy  |  31202
Feb   |  43111    |   456      |  W Simpson Wy  |  31202
Mar   |  43111    |   456      |  W Simpson Wy  |  31202
Apr   |  43111    |   456      |  Simpson Wy W  |  31202-1022
May   |  43111    |   456      |  W Simpson Wy  |  31202

1 个答案:

答案 0 :(得分:0)

尝试使用可更新的CTE:

DECLARE @tbl TABLE (Mnth VARCHAR(100),Premise INT, House_No INT,Street VARCHAR(100),Zip VARCHAR(100));
INSERT INTO @tbl VALUES
 ('Jan',43219,123,'E Haywood Dr','31214')
,('Feb',43219,123,'Haywood Dr E','31214-3291')
,('Mar',43219,123,'E Haywood Dr','31214')
,('Apr',43219,123,'Haywood Dr E','31214-3291')
,('May',43219,123,'E Haywood Dr','31214')
,('Jan',43111,456,'W Simpson Wy','31202')
,('Feb',43111,456,'W Simpson Wy','31202')
,('Mar',43111,456,'W Simpson Wy','31202')
,('Apr',43111,456,'Simpson Wy W','31202-1022')
,('May',43111,456,'W Simpson Wy','31202');

- 第一个CTE只进行分组计数:

WITH Counted AS
(
    SELECT COUNT(Premise) AS [Counter]
          ,Premise
          ,House_No
          ,Street
          ,Zip
    FROM @tbl
    GROUP BY Premise,House_No,Street,Zip
)

- 第二个CTE找到计数最高的行

- 注意:如果有多个具有相同计数的选项,那么选择是相当随机的......

,MostCommon AS
(
    SELECT *
         ,ROW_NUMBER() OVER(PARTITION BY Premise ORDER BY [Counter] DESC) AS MaxCounter        
    FROM Counted
)

- 此CTE是可更新的:您收集实际表数据和新值

,UpdateableCTE AS
(
    SELECT tbl.*
          ,mc.House_No AS NewHouse_No
          ,mc.Street AS NewStreet
          ,mc.Zip AS NewZip 
    FROM @tbl AS tbl
    INNER JOIN MostCommon AS mc ON mc.MaxCounter=1 AND mc.Premise=tbl.Premise
)

- 最后设置新值

UPDATE UpdateableCTE SET House_No=NewHouse_No
                        ,Street=NewStreet
                        ,Zip=NewZip;

- 显示结果

SELECT * FROM @tbl;