在这种情况下,人们要求对错误的地址进行分组。我需要使用现有的工具/环境,我没有选择Google API或第三方数据科学工具的机会。我也做了我的硬件,请参阅发布了数年的文章,因此仍想检查所有是否有可用的更新。 在我的方案中,人们希望将ID 1-6分组为单个,其余的则添加为进行负测试。
SELECT * INTO #t FROM ( --test data: select * from #t drop table #t
SELECT 1 Id, '1 CROLANA HEIGHTS' Adr UNION -- A vs O
SELECT 2 Id, '1 CROLONA HEIGHTS' Adr union
SELECT 3 Id, '1 CROLONA HEIGHT DRIVE' Adr union
SELECT 4 Id,'1 CROLONA HEIGHTS DR' Adr union
SELECT 5 Id, '1 CROLONA HGHTS DR' Adr union
SELECT 6 Id, '1 CROLONA HTS DR' Adr UNION
---------------------------------------- rest should not match
SELECT 7 Id, '1 CORWING DR' Adr UNION
SELECT 8 Id, '1 SUNNYHILL DRIVE' Adr UNION
SELECT 9 Id, '1 CROWN HILL DR' Adr UNION
SELECT 10 Id, '1 ADDISON DRv' Adr ) a
------------------- and below is my fuzzy working script which can be improved)
SELECT id, adr, LEAD(adr,1) OVER ( ORDER BY adr ) adr_lead,
SOUNDEX(adr) Sdx, DIFFERENCE(adr, LEAD(adr,1) OVER ( ORDER BY adr )) diff
--- SOUNDEX(adr), COUNT(*) c
FROM #t
--GROUP BY SOUNDEX(adr)
WHERE SOUNDEX(adr) = SOUNDEX('1 CROLANA HEIGHTS')
答案 0 :(得分:0)
我很乐意接受一些建议。我在字符串和独立单词的末尾使用了智能替换来改善数据。
DECLARE @st VARCHAR(100) = 'La_Beg_10 La_midleMacy La' --replace et the end of string
SELECT 'ryba', @st, '-->' f, CASE WHEN @st LIKE '%' + ' La'
THEN SUBSTRING(@st,1,LEN(@st) - LEN('La')) + 'Lane' ELSE @st END N