我获得了巨大的数据优势,以便将其导入我们的系统。我将它导入SQL表,以便我可以进行所需的数据转换。 我遇到了很多愚蠢的问题。 最新的一个我找不到解决方案的是:
在CompanyName中,很多次我将名称重复两次(并非总是):
[CompanyName]
INTERDYN SA INTERDYN SA
EARTH TOUR EARTH TOUR
SOUNDLIGHTS JAJ CYTER
你看不到任何模式。是否有一种巧妙的方法可以发现重复项并删除双胞胎公司名称?
答案 0 :(得分:2)
只需比较字符串的第一部分和最后部分,并检查中间字符是否为空格。
CREATE TABLE Companies
(
id int identity
, CompanyName varchar(50)
)
INSERT INTO Companies (CompanyName)
VALUES ('test')
, ('test test')
, ('testtest')
, ('testz test')
-- Just query the corrected list
SELECT CASE WHEN substring(CompanyName, LEN(CompanyName)/2+1, 1) = ' ' and substring(CompanyName, 1, LEN(CompanyName)/2) = substring(CompanyName, LEN(CompanyName)/2+2, LEN(CompanyName))
THEN substring(CompanyName, 1, LEN(CompanyName)/2)
ELSE CompanyName
END
FROM Companies
-- update the incorrect values
UPDATE Companies
SET CompanyName = substring(CompanyName, 1, LEN(CompanyName)/2)
WHERE substring(CompanyName, LEN(CompanyName)/2+1, 1) = ' '
AND substring(CompanyName, 1, LEN(CompanyName)/2) = substring(CompanyName, LEN(CompanyName)/2+2, LEN(CompanyName))
select * from Companies
drop table Companies