Question

我已经为XML字符串提供了一个html标签列表，例如＆＃34; <p>, <a>, <img>, <link>＆＃34;等

现在我想创建泛型函数，我将传递html标记列表，或者也可以是一个标记，我想从传递的XML字符串中排除它。函数将返回整个字符串而不排除标记。

  public const String[] htmlTags = new String[] { "<p>", "a", "img" };
  string result = strString.ExcludeHTMLTags(htmlTags); //I will write the String extension not an issue, please suggest how to exclude tags from exisiting string.

编辑：

我正在尝试下面的代码：

/// <summary>
/// Remove HTML tags from string using char array.
/// </summary>
public static string StripTagsCharArray(string source, String[] htmlTags)
{
    char[] array = new char[source.Length];
    int arrayIndex = 0;
    bool inside = false;

    for (int i = 0; i < source.Length; i++)
    {
        foreach (String htmlTag in htmlTags)
        {
            char let = source[i];
            String tag = "<" + "htmlTag"; //How to handle this as this is character
            if (let == tag)
            {
                inside = true;
                continue;
            }
            if (let == '>')
            {
                inside = false;
                continue;
            }
            if (!inside)
            {
                array[arrayIndex] = let;
                arrayIndex++;
            }
        }
    }
    return new string(array, 0, arrayIndex);
}

编辑2：使用正则表达式

String[] htmlTags = new String[] { "a", "img", "p" };
private const string STR_RemoveHtmlTagRegex = "</?{0}[^<]*?>";
public static string RemoveHtmlTag(String input, String[] htmlTags)
{
    String strResult = String.Empty;
    foreach (String htmlTag in htmlTags)
    {
        Regex reg = new Regex(String.Format(STR_RemoveHtmlTagRegex, htmlTag.Trim()), RegexOptions.IgnoreCase);
        strResult = reg.Replace(input, String.Empty);
        input = strResult;
    }
    return strResult;
}

现在的问题是它没有删除标签的值，所以如果有＆＃34;测试

＆＃34;然后它返回＆＃34;测试＆＃34;，我想删除整个标签的值。

Answer 1

将html转换为DOM树并删除名称中包含给定排除标记列表

的元素节点

Answer 2

您是否尝试过Html Agility Pack。它是一个灵活的HTML解析器，可以构建一个读/写DOM并支持普通的XPATH或XSLT，它是一个.NET代码库，可以解析“out of the web”HTML文件，你可以像你一样修复字符串想要，修改DOM，添加节点，复制节点等。

用于从xml字符串中排除特定标记的功能

2 个答案: