如何处理SQL字符串char-by-char以构建匹配权重?

时间:2016-07-18 17:37:41

标签: sql sql-server sql-server-2014

问题:我需要在表单上显示用户输入字段,动态显示某些查找条件。

我目前的解决方案:我已根据相对简单的匹配条件创建了一个带有一些字段输入条件的SQL表。匹配标准基本上是查找值以匹配代码开始,并且通过进行LEN比较找到最精确的匹配。

select 
      f.[IS_REQUIRED]
    , f.[MASK]
    , f.[MAX_LENGTH]
    , f.[MIN_LENGTH]
    , f.[RESOURCE_KEY]
    , f.[SEQUENCE]
from [dbo].[MY_RECORD] r with(nolock)
inner join [dbo].[ENTRY_FORMAT] f with(nolock)
    on  r.[LOOKUP_VALUE] like f.[MATCH_CODE]

-- Logic to filter by single, most-precise record match.
cross apply (
    select f1.[SEQUENCE]
    from [dbo].[ENTRY_FORMAT] f1 with(nolock)
    where f.[SEQUENCE] = f1.[SEQUENCE]
      and s.[MATCH_CODE] like f1.[MATCH_CODE]
    group by f1.[SEQUENCE]
    having len(f.[MATCH_CODE]) = max(len(f1.[MATCH_CODE]))
) tFilter

where r.[ID] = @RecordId

目前的问题是必须针对每场比赛计算最精确的比赛以及每次比赛。此外,我目前只能支持%中的MATCH_CODE。 (例如,'%'是所有LOOKUP_VALUE的默认值,而'12%'的条目则是LOOKUP_VALUE '12345'的更精确匹配,{ {1}} MATCH_CODE显然应该是最精确的匹配。)但是,我想添加对'12345'等通配符的支持。从LEN开始,这肯定是错的,因为[4-7]会增加很多长度,但是,例如'[4-7]'仍然是'12345'

所需的匹配

我想要的更新:要向'123[4-7]'添加MATCH_WEIGHT列,我可以通过插入/更新上的触发器进行更新。对于我最初的实现,我只是寻找可以逐字逐句地查看ENTRY_FORMAT的内容,增加MATCH_CODE,但在执行此操作时将MATCH_WEIGHT视为单个字符。是否有一个好的机制(UDF - SQL或CLR?CURSOR?)用于迭代[..]字段的字符以这种方式计算值?有点像每个非通配符增加varchar两个,也可能通过通配符增加一个;细节有待进一步考虑和制定出来......

目标是使用更像:

的查询
MATCH_WEIGHT

注意:我意识到这是一个相对脆弱的设置。 select f.[IS_REQUIRED] , f.[MASK] , f.[MAX_LENGTH] , f.[MIN_LENGTH] , f.[RESOURCE_KEY] , f.[SEQUENCE] from [dbo].[MY_RECORD] r with(nolock) -- Logic to filter by single, most-precise record match. cross apply ( select top 1 f1.[MATCH_CODE] , f1.[SEQUENCE] from [dbo].[ENTRY_FORMAT] f1 with(nolock) where r.[LOOKUP_VALUE] like f1.[MATCH_CODE] group by f1.[SEQUENCE] order by f1.[MATCH_WEIGHT] desc ) tFilter inner join [dbo].[ENTRY_FORMAT] f with(nolock) on f.[MATCH_CODE] = tFilter.[MATCH_CODE] and f.[SEQUENCE] = tFilter.[SEQUENCE] where r.[ID] = @RecordId 记录仅由了解限制的开发人员输入,因此现在假设输入了有效数据,并且不会导致匹配冲突。

在一些帮助下,我提出了一个实现(下面的答案),但我仍然不确定我的总体设计,所以欢迎更好的答案或任何批评。

1 个答案:

答案 0 :(得分:0)

Steve's answer on another question开始,我已经使用了很多正文来创建一个函数来在匹配代码的末尾完成对[..]通配符的支持。

CREATE FUNCTION CalculateMatchWeight 
(
    -- Add the parameters for the function here
    @MatchCode varchar(100)
)
RETURNS smallint
AS
BEGIN
    -- Declare the return variable here
    DECLARE @Result smallint = 0;

    -- Add the T-SQL statements to compute the return value here
    DECLARE @Pos int = 1, @N0 int = ascii('0'), @N9 int = ascii('9'), @AA int = ascii('A'), @AZ int = ascii('Z'), @Wild int = ascii('%'), @Range int = ascii('[');
    DECLARE @Asc int;
    DECLARE @WorkingString varchar(100) = upper(@MatchCode)

    WHILE @Pos <= LEN(@WorkingString)
    BEGIN
        SET @Asc = ascii(substring(@WorkingString, @Pos, 1));

        If ((@Asc between @N0 and @N9) or (@Asc between @AA and @AZ))
            SET @Result = @Result + 2;

        ELSE
        BEGIN
            -- Check wildcard matching, update value according to match strength, and stop calculating further.
            -- TODO: In the future we may wish to have match codes with wildcards not just at the end; try to figure out a mechanism to calculating that case.
            IF (@Asc = @Range)
            BEGIN
                SET @Result = @Result + 2;
                SET @Pos = 100;
            END
            IF (@Asc = @Wild)
            BEGIN
                SET @Result = @Result + 1;
                SET @Pos = 100;
            END
        END

        SET @Pos = @Pos + 1
    END

    -- Return the result of the function
    RETURN @Result
END

我已经检查过这可以为我试图涵盖的当前案例生成所需的输出:

SELECT [dbo].[CalculateMatchWeight] ('12345');      -- Most precise (10)
SELECT [dbo].[CalculateMatchWeight] ('123[4-5]');   -- Middle       (8)
SELECT [dbo].[CalculateMatchWeight] ('123%');       -- Least        (7)

现在我可以在INSERT / UPDATE的触发器中调用此函数来更新MATCH_WEIGHT

CREATE TRIGGER TRG_ENTRY_FORMAT_CalcMatchWeight
   ON  ENTRY_FORMAT
   AFTER INSERT,UPDATE
AS 
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    -- Insert statements for trigger here
    DECLARE @NewMatchWeight smallint = (select dbo.CalculateMatchWeight(inserted.MATCH_CODE) from inserted),
            @CurrentMatchWeight smallint = (select inserted.MATCH_WEIGHT from inserted);

    IF (@CurrentMatchWeight <> @NewMatchWeight)
    BEGIN
        UPDATE ENTRY_FORMAT
           SET MATCH_WEIGHT = @NewMatchWeight
          FROM inserted
         WHERE ENTRY_FORMAT.[MATCH_CODE] = inserted.[MATCH_CODE]
           AND ENTRY_FORMAT.[SEQUENCE] = inserted.[SEQUENCE]
    END
END