剥离HTML标签,同时忽略大于小于的值<&gt ;?

时间:2014-08-01 08:07:27

标签: sql sql-server-2008 tsql

我正在尝试使用函数从字符串中删除HTML标记。一切都运作良好,直到我尝试处理合法的< >值。我已经使用此代码循环输入HTML并找到后跟0-9的字符并将值返回到输出字符串。

DECLARE @i INT = 0
DECLARE @inputstring VARCHAR(50) = 'This is text <50'
DECLARE @fix VARCHAR(2)

WHILE @i <= 9
BEGIN
    SET @fix = '<'+CAST(@i AS VARCHAR)
    IF @inputstring LIKE '%' + @fix + '%'
             SET @inputstring=replace(@inputstring,'<'+CAST(@i AS VARCHAR),'[['+CAST(@i        AS VARCHAR))
    SET @fix = '>'+CAST(@i AS VARCHAR)
        IF @inputstring NOT LIKE '%'+'SPAN' + @fix + '%' AND @inputstring NOT LIKE '%'+'LI' + @fix + '%'
             SET @inputstring=replace(@inputstring,'>'+CAST(@i AS VARCHAR),']]'+CAST(@i AS VARCHAR))
     PRINT @inputstring
SET @i=@i+1
END

当我有一个合法的标签后跟一个数字时会出现问题,例如<SPAN>50<SPAN> 在这种情况下,函数找不到结束标记并截断返回的字符串。有办法解决这个问题吗?

1 个答案:

答案 0 :(得分:1)

我可能过度简化了这一点,但这可行吗?

DECLARE @inputstring VARCHAR(50) = 'This is text <50'
SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@inputstring, '<SPAN>', ''), '</SPAN>', ''), '<LI>', ''), '</LI>', ''), '<UL>', ''), '</UL>', ''), '<P>', ''), '</P>', ''), '<', '[['), '>', ']]')