在{花括号}中解析标记

时间:2013-10-11 22:53:57

标签: sql sql-server sql-server-2008 tsql

我需要使用T-SQL从以下段落中解析{}中包含的字符串,然后显示它们。

  

这是一个{Term1}的测试句子。有时,{Term2}可以是{Phrase Term3}之类的单词或短语。 {Term2}重复。某些术语可能是{Term2}等另一个术语的复数形式。这是一个真正的{Simple} Term。

期望的结果:

Term1
Term2
Phrase Term3
Term2
Term2
Simple

2 个答案:

答案 0 :(得分:3)

你可以使用多语句表值函数来做到这一点,但我真的认为这种类型的解析更好地留给更强大的语言。这将处理标记{up to 255 characters}并输入最多约8,000个字符的字符串,具体取决于SQL Server的版本。如果您需要更多内容,请将sys.all_columns替换为your own numbers table。请注意,我没有采取任何措施来保护无效的令牌序列......

CREATE FUNCTION dbo.ParseTokens
(
    @string NVARCHAR(MAX),
    @token1 NVARCHAR(255),
    @token2 NVARCHAR(255)
)
RETURNS @t TABLE([Index] INT IDENTITY(1,1), Item NVARCHAR(255))
AS
BEGIN
    INSERT @t(Item) 
    SELECT SUBSTRING(x, 1, COALESCE(NULLIF(CHARINDEX(@token2, x)-1,-1),255)) 
    FROM 
    (
      SELECT Number, x = SUBSTRING(@string, Number, 
        CHARINDEX(@token1, @string + @token1, Number) - Number)
      FROM
      (
        SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
          FROM sys.all_columns
      ) AS n(Number) WHERE Number <= CONVERT(INT, LEN(@string))
        AND SUBSTRING(@token1 + @string, Number, LEN(@token1)) = @token1
    ) AS y
    ORDER BY Number OPTION (MAXDOP 1);

    DELETE @t WHERE [Index] = 1;

    RETURN;
END
GO

示例用法 - 在独立字符串上:

DECLARE @x NVARCHAR(MAX);

SET @x = N'foo{bar} and think {splunge}';

SELECT Item FROM dbo.ParseTokens(@x, '{', '}') ORDER BY [Index];

结果:

Item
-------
bar
splunge

示例用法 - 针对表:

DECLARE @x TABLE(ID INT IDENTITY(1,1), n NVARCHAR(MAX));

INSERT @x SELECT N'Here is a test sentence with a {Term1}. Sometime, a {Term2}
  could be a word or phrase like {Phrase Term3}. {Term2} is repeated. Some Terms
  could be a plural form of a another Term like {Term2}s. Here is a real
  {Simple} Term.';

INSERT @x SELECT N'Hello {foo} there {bar} ...';

SELECT t.ID, p.Item
 FROM @x AS t
 CROSS APPLY dbo.ParseTokens(t.n, '{', '}') AS p;

结果:

ID     Item
----   ------------
1      Term1
1      Term2
1      Phrase Term3
1      Term2
1      Term2
1      Simple
2      foo
2      bar

答案 1 :(得分:3)

您可以将所有{替换为start元素,将所有}替换为closing元素,然后将XML替换为标记,从而将字符串转换为XML。

declare @S nvarchar(max)
set @S = N'Here is a test sentence with a {Term1}. Sometime, a {Term2} could be a word or phrase like {Phrase Term3}. {Term2} is repeated. Some Terms could be a plural form of a another Term like {Term2}s. Here is a real {Simple} Term.'

select T.N.value('text()[1]', 'nvarchar(max)') as Token
from (select cast(replace(replace(@S, N'{', N'<token>'), N'}', N'</token>') as xml)) as S(X)
  cross apply S.X.nodes('token') as T(N)

SQL Fiddle