正则表达式 - 如何将一个列表中的任何4个字母单词与另一个忽略特殊字符的任何4个字母单词匹配?只是a-z

时间:2016-01-08 17:57:12

标签: sql-server regex

正则表达式 - 如何将一个列表中的任何 3个字母或更多与任何 3个字母或更多匹配在另一个忽略的特殊字符上?只需a-z,用空格替换特殊字符。

我在SQL查询中使用的是$ [a-z],但如果基本列表有()/.,& amp;

,它有局限性

例如,

li = ['ANcPI', 'DLBvA', 'FpSCo', 'beMhy', 'dWDjl']
li.sort(key=lambda m : m.lower())

我希望所有美国银行能够匹配。

结果:

>>>print(li)
['ANcPI', 'beMhy', 'DLBvA', 'dWDjl', 'FpSCo']

这是我的SQL查询到目前为止的样子:

List A:
Bank of America
BofA
Bank of America Riverside
Bank of America Inc.
Bank and America
BankOfAmerica
International Business Machine

List B:
Bank of America (BofA)
IBM

1 个答案:

答案 0 :(得分:2)

您可以使用递归CTE提取所有3克进行比较。以下示例。我不会使用预处理CTE,这只是为了方便,我会创建一个UDF来提取你不想要的东西。

declare @t1 table (field1 varchar(50));
declare @t2 table (field2 varchar(50));

insert into @t1 values 
    ('Bank of America'),
    ('BofA'),
    ('Bank of America Riverside'),
    ('Bank of America Inc.'),
    ('Bank and America'),
    ('BankOfAmerica'),
    ('International Business Machine')
;

insert into @t2 values 
    ('Bank of America (BofA)'),
    ('IBM')
;

WITH preprocessCTE1 AS (
    SELECT 
        field1,
        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(field1, 'Inc.', ''), '(', ' '), ')', ' '), '.', ' '), '.', ' ') AS processedfield1
    FROM @t1
),

preprocessCTE2 AS (
    SELECT 
        field2,
        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(field2, 'Inc.', ''), '(', ' '), ')', ' '), '.', ' '), '.', ' ') AS processedfield2
    FROM @t2
),

recurse1 AS (
    SELECT 
        field1,
        processedfield1,
        1 AS Position,
        SUBSTRING(processedfield1, 1, 3) AS Trigram
    FROM preprocessCTE1

    UNION ALL

    SELECT
        field1,
        processedfield1,
        Position + 1 AS Position,
        SUBSTRING(processedfield1, Position + 1, 3) AS Trigram
    FROM recurse1
    WHERE (LEN(processedfield1) - 3) >= Position
),

recurse2 AS (
    SELECT 
        field2,
        processedfield2,
        1 AS Position,
        SUBSTRING(processedfield2, 1, 3) AS Trigram
    FROM preprocessCTE2

    UNION ALL

    SELECT
        field2,
        processedfield2,
        Position + 1 AS Position,
        SUBSTRING(processedfield2, Position + 1, 3) AS Trigram
    FROM recurse2
    WHERE (LEN(processedfield2) - 3) >= Position
)

SELECT DISTINCT
    recurse1.field1,
    recurse2.field2
FROM 
    recurse1 INNER JOIN
    recurse2 ON
    recurse1.Trigram = recurse2.Trigram