正则表达式 - 如何将一个列表中的任何 3个字母或更多与任何 3个字母或更多匹配在另一个忽略的特殊字符上?只需a-z,用空格替换特殊字符。
我在SQL查询中使用的是$ [a-z],但如果基本列表有()/.,& amp;
,它有局限性例如,
li = ['ANcPI', 'DLBvA', 'FpSCo', 'beMhy', 'dWDjl']
li.sort(key=lambda m : m.lower())
我希望所有美国银行能够匹配。
结果:
>>>print(li)
['ANcPI', 'beMhy', 'DLBvA', 'dWDjl', 'FpSCo']
这是我的SQL查询到目前为止的样子:
List A:
Bank of America
BofA
Bank of America Riverside
Bank of America Inc.
Bank and America
BankOfAmerica
International Business Machine
List B:
Bank of America (BofA)
IBM
答案 0 :(得分:2)
您可以使用递归CTE提取所有3克进行比较。以下示例。我不会使用预处理CTE,这只是为了方便,我会创建一个UDF来提取你不想要的东西。
declare @t1 table (field1 varchar(50));
declare @t2 table (field2 varchar(50));
insert into @t1 values
('Bank of America'),
('BofA'),
('Bank of America Riverside'),
('Bank of America Inc.'),
('Bank and America'),
('BankOfAmerica'),
('International Business Machine')
;
insert into @t2 values
('Bank of America (BofA)'),
('IBM')
;
WITH preprocessCTE1 AS (
SELECT
field1,
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(field1, 'Inc.', ''), '(', ' '), ')', ' '), '.', ' '), '.', ' ') AS processedfield1
FROM @t1
),
preprocessCTE2 AS (
SELECT
field2,
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(field2, 'Inc.', ''), '(', ' '), ')', ' '), '.', ' '), '.', ' ') AS processedfield2
FROM @t2
),
recurse1 AS (
SELECT
field1,
processedfield1,
1 AS Position,
SUBSTRING(processedfield1, 1, 3) AS Trigram
FROM preprocessCTE1
UNION ALL
SELECT
field1,
processedfield1,
Position + 1 AS Position,
SUBSTRING(processedfield1, Position + 1, 3) AS Trigram
FROM recurse1
WHERE (LEN(processedfield1) - 3) >= Position
),
recurse2 AS (
SELECT
field2,
processedfield2,
1 AS Position,
SUBSTRING(processedfield2, 1, 3) AS Trigram
FROM preprocessCTE2
UNION ALL
SELECT
field2,
processedfield2,
Position + 1 AS Position,
SUBSTRING(processedfield2, Position + 1, 3) AS Trigram
FROM recurse2
WHERE (LEN(processedfield2) - 3) >= Position
)
SELECT DISTINCT
recurse1.field1,
recurse2.field2
FROM
recurse1 INNER JOIN
recurse2 ON
recurse1.Trigram = recurse2.Trigram