Question

我有一个包含6列的表格，其中包含HTML内容，其中包含一些标记，现在当移动到新设计的网站时，大部分HTML代码都必须删除。或多或少除<B>和</B>以外的所有标记。

有没有一种很好的方法可以做到这一点，确定所有标签最终在数据中删除它们？我确定没有＆lt; ＆GT;测试中的符号所以正则表达式可能会起作用吗？

我的另一种方法是获取每一行，处理它并更新数据库，但我猜这可以直接在T-SQL中完成。

我的服务器是MSSQL 2008，位于托管环境中，但如果需要，我可以获取本地副本。

谢谢，斯蒂芬

Answer 1

使用SQL 2000中的正则表达式http://blogs.msdn.com/b/khen1234/archive/2005/05/11/416392.aspx

从SQL 2005起http://weblogs.sqlteam.com/jeffs/archive/2007/04/27/SQL-2005-Regular-Expression-Replace.aspx

修改最后一个链接提供的正则表达式似乎可以在我对SQL2005 的极其肤浅的测试中起作用，但对于仅限4000个字符的字符串！

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic=true,IsPrecise=true)]
    public static SqlString StripAllButBoldTags(SqlString expression)
    {
        if (expression.IsNull)
            return SqlString.Null;

        Regex r = new Regex("</?([a-z][a-z0-9]*[^<>]*)>", RegexOptions.IgnoreCase);

        return new SqlString(r.Replace(expression.ToString(), new MatchEvaluator(ComputeReplacement)));
    }

    public static String ComputeReplacement(Match m)
    {
        return string.Compare( m.Groups[1].Value, "B",true) == 0? m.Value: "";
    }
};

SQL Server T-SQL语句替换/删除子字符串

1 个答案: