有没有一种方法可以排除重复的SQL结果?

时间:2019-05-15 17:02:57

标签: sql duplicates

我有一个每天运行的查询,其中显示新旧成员地址的更新情况。除了在我们的核心系统中完成USPS地址匹配并仅更改某些缩写的时间之外,该查询工作正常。

例如:

旧地址-东大街1234号 新地址-1234 E Main St

我不需要查看这些结果。

我曾尝试根据核心中的唯一字段删除,但是USPS匹配过程会创建所有新字段,因此查询无法基于该信息删除。

主要的SP是:

INSERT INTO @results  
SELECT   
distinct i.INDIVIDUAL_ID, 
 i.FIRST_NAME,  
 i.MIDDLE_NAME,  
 i.LAST_NAME, 
 i.D1NAME,    
 CurrentAddress.ADDRESS1,  
 PreviousAddress.ADDRESS1,  
 CurrentAddress.ADDRESS2,  
 PreviousAddress.ADDRESS2,  
 CurrentAddress.ADDRESS3,  
 PreviousAddress.ADDRESS3,  
 CurrentAddress.CITY,  
 PreviousAddress.CITY,  
 CurrentAddress.STATE,  
 PreviousAddress.STATE,  
 CurrentAddress.ZIP_STR,  
 PreviousAddress.ZIP_STR,  
 CurrentAddress.ZIP4_STR,  
 PreviousAddress.ZIP4_STR,  
 CurrentAddress.COUNTRY,  
 PreviousAddress.COUNTRY  
 FROM INDIVIDUAL i  
 INNER JOIN MEMBERSHIPPARTICIPANT mpt  
 ON i.INDIVIDUAL_ID = mpt.INDIVIDUAL_ID  
 AND i.DL_LOAD_DATE = mpt.DL_LOAD_DATE  
 INNER JOIN AGR_MEMBERTOTAL_TODAY m  
 ON mpt.MEMBER_NBR = m.MEMBER_NBR  
 AND mpt.DL_LOAD_DATE = m.DL_LOAD_DATE  
 INNER JOIN BRANCH b  
 ON i.BRANCH_NBR = b.BRANCH_NBR  
 CROSS APPLY dbo.GetCurrentAddress(i.INDIVIDUAL_ID, @latestDate)  AS CurrentAddress  
 CROSS APPLY dbo.GetCurrentAddress(i.INDIVIDUAL_ID, @previousDate) AS PreviousAddress  
 WHERE i.DL_LOAD_DATE = @latestDate  
 AND ( m.OPN_LN_ALL_CNT > 0 OR m.OPN_SV_ALL_CNT > 0 )  
 order by  i.FIRST_NAME asc    


DELETE @results
WHERE Address1_Today = Address2_Yesterday
AND Address2_Today = Address1_Yesterday

SELECT * 
FROM @results  
WHERE (Address1_Today != Address1_Yesterday  
  OR Address2_Today != Address2_Yesterday  
  OR Address3_Today != Address3_Yesterday  
  OR City_Today != City_Yesterday  
  OR State_Today != State_Yesterday  
  OR ZipCode_Today != ZipCode_Yesterday  
  --OR FullZip_Today != FullZip_Yesterday     
  OR Country_Today != Country_Yesterday)       

我想删除几乎重复的行

例如:

Old Address - 1234 East Main Street
New Address - 1234 E Main St

1 个答案:

答案 0 :(得分:0)

没有通过SQL进行测试的内置方法,必须通过过程的逻辑进行定义。我要做的第一件事是根据这些子字符串的计数将旧地址和新地址中的子字符串分组。在行级别上计数彼此相等的计数,可以按空间分割并拆分地址。将每个地址字段视为三个部分[street_nbr,street_nm,street_suffix]。 street_nm可以带有缩写前缀,这就是为什么将子字符串计数分组很重要,从而使计数增加到3以上的原因。然后,与您标识的单词/缩写匹配的辅助查找表可以用于“取消重复”那些后缀和前缀。


    CREATE TABLE lookup_abbreviations(
        unabbreviated_name varchar(50),
        abbreviated_name varchar(50));

    INSERT INTO lookup_abbreviations(unabbreviated_name, abbreviated_name)
        VALUES ('East', 'E')
    INSERT INTO lookup_abbreviations(unabbreviated_name, abbreviated_name)
        VALUES ('Street', 'St');

-- Use Cross Applies and functions(LEN, LEFT, RIGHT, CHARINDEX, SUBSTRING) to split the address
-- into equal parts. This is where you'll have to figure out the best logic for grouping.

    SELECT DISTINCT

        Old_Street_Nbr = SUBSTRING(Old_Address, CHARINDEX(' ', Old_Address))
        Old_Street_Nm_Prefix = CASE WHEN /*Here is where the count of substrings is tested*/ END
        Old_Street_Nm = CASE WHEN /*Here is where the count of substrings is tested*/ END
        Old_Street_Suffix = []
    INTO #AbbreviatonSort
    FROM Results;

    SELECT 
       Old_Street_Nbr    ,
       Old_Street_Nm_Prefix  = CASE 
                                  WHEN Old_Street_Nm_Prefix IN (SELECT abbreviated_name from 
                               lookup_abbreviations)
                                 THEN (SELECT unabbreviated_name from 
                                       lookup_abbreviations WHERE abbreviated_name = 
                                       Old_Street_Nm_Prefix)
                               ELSE Old_Street_Nm_Prefix
                               END
    INTO #SortedAddresses
    FROM #AbbreviationSort
    ;

    SELECT DISTINCT * FROM
   (
   SELECT Old_Street_Nbr, Old_Prefix FROM #SortedAddresses
   UNION ALL
   SELECT New_Street_Nbr, New_Prefix FROM #SortedAddresses
   ) AS DupSearch