如何比较SQL Server 2014中两个字符串是否包含相同的单词?

时间:2017-03-23 14:16:52

标签: sql-server sql-server-2014

我正在尝试解决与this问题相同的问题,但这次是在SQL Server 2014中。我需要检查字符串是否由相同的单词组成:

为:

返回 true
Antoine de Saint-Exupéry = de Saint-Exupéry Antoine = Saint-Exupéry Antoine de = etc.

为:

返回 false
Antoine de Saint-Exupéry != Antoine de Saint != Antoine Antoine de Saint-Exupéry != etc.

我在SQL Server 2014中有哪些选择?这种比较是否有内置功能?

2 个答案:

答案 0 :(得分:2)

要比较2个字符串,可以 abuse 使用XQuery中的排序功能。

将字符串转换为XML,对元素进行排序,然后返回不带标记的字符串。

例如:

DECLARE @Words1 NVARCHAR(MAX) = N'Antoine de Saint-Exupéry';
DECLARE @Words2 NVARCHAR(MAX) = N'Saint-Exupéry Antoine de';

DECLARE @SortedWords1 NVARCHAR(MAX) = cast('<x>'+replace(@Words1,' ','</x><x>')+'</x>' as XML).query('for $x in /x order by $x ascending return $x').value('.','nvarchar(max)');
DECLARE @SortedWords2 NVARCHAR(MAX) = cast('<x>'+replace(@Words2,' ','</x><x>')+'</x>' as XML).query('for $x in /x order by $x ascending return $x').value('.','nvarchar(max)');

DECLARE @SameWords BIT = (case 
                          when @SortedWords1 = @SortedWords2
                          then 1 
                          else 0 
                          end);


SELECT @SameWords as SameWords;

返回:

SameWords
---------
True 

答案 1 :(得分:1)

这是你可以为此推出自己的一种方式。我正在使用Jeff Moden的字符串分割器。你可以在这里找到原始文章。 http://www.sqlservercentral.com/articles/Tally+Table/72993/。如果你不喜欢那个分离器,那么这里还有其他一些很棒的版本。 https://sqlperformance.com/2012/07/t-sql-queries/split-strings。我喜欢Jeff Moden的那个,因为与其他任何分割器不同,你得到的ItemNumber返回,在某些情况下非常有用。

void loop() {
   int serialIndex = 0;
   if(Serial.available() > 0){     
     while (0 < Serial.available()) {            // loop through all the received bytes 
        String bufferString;
        uint8_t bufferInt;
        bufferString = Serial.readStringUntil(','); 
        bufferInt = bufferString.toInt();      
        serialBuffer[serialIndex] = bufferInt;  // put current index byte in array      
        serialIndex ++;                          // add index. 
     }     
     sendBytes(0); 
   }
   delay(50);
}

这里的基本概念是你必须将你的字符串分成单词然后进行比较。我使用了几个cte,因此它显示了它的工作原理。以下适用于您发布的所有示例。

CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT 1 UNION ALL
                 SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
                ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
                 SELECT s.N1,
                        ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
                   FROM cteStart s
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l
;