字符串模糊匹配

时间:2011-04-20 12:42:26

标签: sql sql-server sql-server-2008

我有一个与MSSQL数据库中的字符串匹配相关的问题。基本上,我有一个包含ICD9和CPT代码的表。问题是这些代码的格式通常是不正确的(即字符太多,缺少小数等等)。我需要能够从包含正确代码的查找表中查找每个代码的描述。

由于这些代码的结构方式,我可以进行某种类型的“渐进式”匹配,至少找到代码的类别。

让我们说正确的代码是这样的:306.98

对于这个例子,我们假装在306和307之间没有其他值。

我想删除小数并查找匹配项,一次一个字符,直到找不到一个字符。然后选择最后一个匹配的字符串。

所以306,3069,3098,306981,3069812等......将匹配字符串306.98。

我希望这对每个人都有意义。我不确定如何开始这样做,所以任何建议都会有很大的帮助。

4 个答案:

答案 0 :(得分:1)

一种可能的解决方案是将代码拆分为其基本元素(306),然后执行类似的运算符:

WHERE Code LIKE '306%'

答案 1 :(得分:1)

使用FLOOR函数去除小数部分,然后在WHERE子句中使用LIKE运算符。

类似的东西:

SELECT <COLUMN-LIST>
  FROM <TABLE-NAME>
 WHERE <THE-COLUMN> LIKE CAST(FLOOR(306.09) AS VARCHAR) + '%'

答案 2 :(得分:0)

这里有你的例子。你只需要将值转换为nvarchar @string。

DECLARE @string AS NVARCHAR (MAX) = '306.98';
DECLARE @Table TABLE (
    TextVal NVARCHAR (MAX));

INSERT INTO @Table ([TextVal])
SELECT '4444656'
UNION ALL
SELECT '30'
UNION ALL
SELECT '3069'
UNION ALL
SELECT '306989878787'
;

WITH   numbers
AS     (SELECT ROW_NUMBER() OVER ( ORDER BY (SELECT 1)) AS Number
        FROM   [sys].[objects] AS o1 CROSS JOIN [sys].[objects] AS o2),
       Chars
AS     (SELECT SUBSTRING(@string, [Number], 1) AS Let,
               [Number]
        FROM   [numbers]
        WHERE  [Number] <= LEN(@string)),
       Joined
AS     (SELECT [Let],
               CAST (1 AS BIGINT) AS Number
        FROM   chars
        WHERE  [Number] = 1
        UNION ALL
        SELECT [J].[Let] + CASE 
                           WHEN [Chars].[Let] = '.' THEN '' ELSE [Chars].[Let] 
                           END AS LEt,
               Chars.[Number]
        FROM   [Joined] AS J
               INNER JOIN
               [Chars]
               ON [Chars].[Number] = [J].[Number] + 1)
SELECT *
FROM   @Table AS T
WHERE  [T].[TextVal] IN (SELECT [Let]
                         FROM   [Joined])
          OR [T].[TextVal] LIKE '%'+(SELECT TOP 1 [Let] FROM
          [Joined] ORDER BY [Number] DESC )  +'%'            
                         ;

结果将是:

 TextVal
30
3069
306989878787

答案 3 :(得分:0)

我能够弄清楚。基本上,我只需要遍历字符串的每个字符并查找匹配,直到找不到一次。谢谢你的帮助!

/* ICD9 Lookup */

USE TSiData_Suite_LWHS_V11

DECLARE @String NVARCHAR (10)
DECLARE @Match NVARCHAR(10)
DECLARE @Substring NVARCHAR (10)
DECLARE @Description NVARCHAR(MAX) 
DECLARE @Length INT
DECLARE @Count INT

SET @String = '309.99999999'

/* Remove decimal place from string */
SET @String = REPLACE(@String,'.','')

/* Get lenth of string */
SET @Length = LEN(@String)

/* Initialize count */
SET @Count = 1

/* Get Substring */
SET @Substring = SUBSTRING(@String,1,@Count)

/* Start processing */
IF (@Length < 1 OR @String IS NULL)
    /* Validate @String */
    BEGIN

        SET @Description = 'No match found for string. String is not proper length.'

    END
ELSE IF ((SELECT COUNT(*) FROM LookupDiseases WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%') < 1)
    /* Check for at least one match */
    BEGIN

        SET @Description = 'No match found for string.'

    END
ELSE
    /* Look for matching code */
    BEGIN

        WHILE ((SELECT COUNT(*) FROM ICD9Lookup WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%') <> 1 AND (@Count < @Length + 1))
        BEGIN

            /* Update substring value */
            SET @Substring = SUBSTRING(@String,1,@Count + 1)

            /* Increment @Count */
            SET @Count += 1

            /* Select the first matching code and get description */
            SELECT TOP(1) @Match =  LookupCodeDesc, @Description = LookupName FROM ICD9Lookup WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%' ORDER BY LookupCodeDesc ASC

        END
    END

PRINT @Match
PRINT @Description