我有一个每天运行的查询,其中显示新旧成员地址的更新情况。除了在我们的核心系统中完成USPS地址匹配并仅更改某些缩写的时间之外,该查询工作正常。
例如:
旧地址-东大街1234号 新地址-1234 E Main St
我不需要查看这些结果。
我曾尝试根据核心中的唯一字段删除,但是USPS匹配过程会创建所有新字段,因此查询无法基于该信息删除。
主要的SP是:
INSERT INTO @results
SELECT
distinct i.INDIVIDUAL_ID,
i.FIRST_NAME,
i.MIDDLE_NAME,
i.LAST_NAME,
i.D1NAME,
CurrentAddress.ADDRESS1,
PreviousAddress.ADDRESS1,
CurrentAddress.ADDRESS2,
PreviousAddress.ADDRESS2,
CurrentAddress.ADDRESS3,
PreviousAddress.ADDRESS3,
CurrentAddress.CITY,
PreviousAddress.CITY,
CurrentAddress.STATE,
PreviousAddress.STATE,
CurrentAddress.ZIP_STR,
PreviousAddress.ZIP_STR,
CurrentAddress.ZIP4_STR,
PreviousAddress.ZIP4_STR,
CurrentAddress.COUNTRY,
PreviousAddress.COUNTRY
FROM INDIVIDUAL i
INNER JOIN MEMBERSHIPPARTICIPANT mpt
ON i.INDIVIDUAL_ID = mpt.INDIVIDUAL_ID
AND i.DL_LOAD_DATE = mpt.DL_LOAD_DATE
INNER JOIN AGR_MEMBERTOTAL_TODAY m
ON mpt.MEMBER_NBR = m.MEMBER_NBR
AND mpt.DL_LOAD_DATE = m.DL_LOAD_DATE
INNER JOIN BRANCH b
ON i.BRANCH_NBR = b.BRANCH_NBR
CROSS APPLY dbo.GetCurrentAddress(i.INDIVIDUAL_ID, @latestDate) AS CurrentAddress
CROSS APPLY dbo.GetCurrentAddress(i.INDIVIDUAL_ID, @previousDate) AS PreviousAddress
WHERE i.DL_LOAD_DATE = @latestDate
AND ( m.OPN_LN_ALL_CNT > 0 OR m.OPN_SV_ALL_CNT > 0 )
order by i.FIRST_NAME asc
DELETE @results
WHERE Address1_Today = Address2_Yesterday
AND Address2_Today = Address1_Yesterday
SELECT *
FROM @results
WHERE (Address1_Today != Address1_Yesterday
OR Address2_Today != Address2_Yesterday
OR Address3_Today != Address3_Yesterday
OR City_Today != City_Yesterday
OR State_Today != State_Yesterday
OR ZipCode_Today != ZipCode_Yesterday
--OR FullZip_Today != FullZip_Yesterday
OR Country_Today != Country_Yesterday)
我想删除几乎重复的行
例如:
Old Address - 1234 East Main Street
New Address - 1234 E Main St
答案 0 :(得分:0)
没有通过SQL进行测试的内置方法,必须通过过程的逻辑进行定义。我要做的第一件事是根据这些子字符串的计数将旧地址和新地址中的子字符串分组。在行级别上计数彼此相等的计数,可以按空间分割并拆分地址。将每个地址字段视为三个部分[street_nbr,street_nm,street_suffix]。 street_nm可以带有缩写前缀,这就是为什么将子字符串计数分组很重要,从而使计数增加到3以上的原因。然后,与您标识的单词/缩写匹配的辅助查找表可以用于“取消重复”那些后缀和前缀。
CREATE TABLE lookup_abbreviations(
unabbreviated_name varchar(50),
abbreviated_name varchar(50));
INSERT INTO lookup_abbreviations(unabbreviated_name, abbreviated_name)
VALUES ('East', 'E')
INSERT INTO lookup_abbreviations(unabbreviated_name, abbreviated_name)
VALUES ('Street', 'St');
-- Use Cross Applies and functions(LEN, LEFT, RIGHT, CHARINDEX, SUBSTRING) to split the address
-- into equal parts. This is where you'll have to figure out the best logic for grouping.
SELECT DISTINCT
Old_Street_Nbr = SUBSTRING(Old_Address, CHARINDEX(' ', Old_Address))
Old_Street_Nm_Prefix = CASE WHEN /*Here is where the count of substrings is tested*/ END
Old_Street_Nm = CASE WHEN /*Here is where the count of substrings is tested*/ END
Old_Street_Suffix = []
INTO #AbbreviatonSort
FROM Results;
SELECT
Old_Street_Nbr ,
Old_Street_Nm_Prefix = CASE
WHEN Old_Street_Nm_Prefix IN (SELECT abbreviated_name from
lookup_abbreviations)
THEN (SELECT unabbreviated_name from
lookup_abbreviations WHERE abbreviated_name =
Old_Street_Nm_Prefix)
ELSE Old_Street_Nm_Prefix
END
INTO #SortedAddresses
FROM #AbbreviationSort
;
SELECT DISTINCT * FROM
(
SELECT Old_Street_Nbr, Old_Prefix FROM #SortedAddresses
UNION ALL
SELECT New_Street_Nbr, New_Prefix FROM #SortedAddresses
) AS DupSearch