找到两个text / varchar列之间第一个不匹配的位置

时间:2016-05-11 17:17:56

标签: sql-server

我有两列应该包含相同的文本 - 有时当内容很大时,很难找到差异实际所在的位置。

它不是完美的,但是如果有一个接受两列值的函数并返回第一次匹配发生的位置会相当有帮助。由于这将在一个选择中被称为表单,性能将是相当重要的,但它只会偶尔运行,所以不是一个大问题。

或者,可以在源代码控制实用程序中执行与DIFF类似的功能是理想的,但我无法想象会有多复杂。

3 个答案:

答案 0 :(得分:1)

你可以很多地改进这个解决方案,但是可以从这个逻辑中解脱出来:

address

答案 1 :(得分:0)

要查找两列是否不同,可以使用BINARY_CHECKSUM()。这足够敏感,甚至可以获得大写与小写的区别。

至于找到两个字符串之间的区别,我会选择2个cte,然后使用EXCEPT。第一个结果是第一次偏离字符串。

declare @test1 VARCHAR(1000) = 'I have two columns that should contain identical text - sometimes when the contents are large it is very difficult to find where the discrepancy is actually located.'
declare @test2 VARCHAR(1000) = 'I have two columns that should contain identical text - sometimes when the contents are Large it is very difficult to find where the discrepancy is actually located.'

SELECT CASE WHEN BINARY_CHECKSUM(@test1) = BINARY_CHECKSUM(@test2) THEN 'Identical' ELSE 'Not Identical' END AS check_


;with cte1 AS (
SELECT 
1 AS col_

UNION ALL 

SELECT col_ + 1
FROM cte1 
WHERE col_ < (SELECT LEN(@test1))
), 
 cte2 AS (
SELECT 
1 AS col_

UNION ALL 

SELECT col_ + 1
FROM cte2 
WHERE col_ < (SELECT LEN(@test2))
), final_cte AS (

SELECT col_, SUBSTRING(@test1, col_, 1) AS char_, BINARY_CHECKSUM(SUBSTRING(@test1, col_, 1)) AS char_checksum
FROM CTE1 

EXCEPT 

SELECT col_, SUBSTRING(@test2, col_, 1) AS char_, BINARY_CHECKSUM(SUBSTRING(@test2, col_, 1)) AS char_checksum
FROM CTE2
)

select col_, char_
FROM final_cte 
ORDER by col_ 
OPTION (MAXRECURSION 1000)

答案 2 :(得分:0)

尝试一下:

CREATE FUNCTION dbo.fcn_DiffPosition
(
    @str1 nvarchar(max),
    @str2 nvarchar(max)
)
RETURNS INT
AS
BEGIN
    DECLARE @MinPosition int = NULL

    ;WITH cte1 AS
    (
        SELECT      1                       AS CharacterPosition,
                    SUBSTRING(@str1, 1, 1)  AS [Character]
        UNION ALL
        SELECT      CharacterPosition + 1,
                    SUBSTRING(@str1, CharacterPosition + 1, 1)
        FROM        cte1
        WHERE       CharacterPosition < LEN(@str1)
    ),
    cte2 AS
    (
        SELECT      1                       AS CharacterPosition,
                    SUBSTRING(@str2, 1, 1)  AS [Character]
        UNION ALL
        SELECT      CharacterPosition + 1,
                    SUBSTRING(@str2, CharacterPosition + 1, 1)
        FROM        cte2
        WHERE       CharacterPosition < LEN(@str2)
    )

    SELECT      @MinPosition = MIN(ISNULL(cte1.CharacterPosition, cte2.CharacterPosition))
    FROM        cte1
    FULL JOIN   cte2 ON cte1.CharacterPosition = cte2.CharacterPosition
    WHERE       ISNULL(cte1.[Character], '') != ISNULL(cte2.[Character], '')
    OPTION      (MAXRECURSION 0)

    RETURN @MinPosition
END

如果两个字符串相同,则函数将返回null。根据您的整理,这可能是区分大小写或不区分大小写。例如:

SELECT dbo.fcn_DiffPosition('Hello World', 'HelloWorld') -- 6
SELECT dbo.fcn_DiffPosition('Hello',       'HelloWorld') -- 6
SELECT dbo.fcn_DiffPosition('Dog', 'Dog')                -- NULL