用自定义标记替换粗体(强)和斜体(em)标签

时间:2015-09-23 13:25:36

标签: c# html string replace

我需要找到一种方法来替换文本字符串中的粗体(强)和斜体(em)标签,使用“粗体”,“斜体”或“粗体”。基本上任何只是“粗体”的文本都会有“粗体”标签,任何只是“斜体”的文字都会有“斜体”标签,任何“粗体”和“斜体”的文字都会有“粗体” “围绕它标记。

我不能对文本进行简单的替换,我正在考虑使用HtmlAgilityPack,但我无法弄清楚如何使用它来实现我想要的东西,因为它需要确定哪些开放标记用于确定哪个需要结束标签。有人可以提供任何建议吗?以下是我需要它如何工作的一些示例:

此:

<strong><em>The quick brown fox</em> jumps over the lazy dog.</strong>

需要:

<bolditalic>The quick brown fox</bolditalic><bold> jumps over the lazy dog.</bold>

此:

<em><strong>The </strong></em><strong>quick<em> brown fox jumps over the lazy dog.</em></strong>

需要:

<bolditalic>The </bolditalic><bold>quick</bold><bolditalic> brown fox jumps over the lazy dog.</bolditalic>

此:

<strong>The quick</strong> brown <em>fox jumps over</em> the lazy dog.

需要:

<bold>The quick</bold> brown <italic>fox jumps over</italic> the lazy dog.

已更新

我已经开始使用HtmlAgilityPack处理以下代码了,我已经为某些场景工作但不适用于所有场景。我想知道我是否过于复杂,或者是否有人可以向我提供任何帮助,以便我如何改进它并让它适用于所有场景?

// Variables
StringBuilder result = new StringBuilder();

// Load html document
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(text);

string test = string.Empty;

foreach (HtmlNode node in htmlDoc.DocumentNode.Descendants())
{
    // If just text (i.e. not bold or italic)
    if (node.Name == "#text" && node.ParentNode.Name != "strong" && node.ParentNode.Name != "em" && node.ParentNode.Name != "bold"
            && node.ParentNode.Name != "italic" && node.ParentNode.Name != "bolditalic")
        result.Append(node.InnerHtml);

    // If bold or italic
    if (node.Name == "strong" || node.Name == "em")
    {
        // Is bold or italic?
        bool bold = node.Name == "strong";

        // If italic/bold child node exists
        if (node.ChildNodes != null && (bold && node.ChildNodes.Any(x => x.Name == "em")) || !bold && node.ChildNodes.Any(x => x.Name == "strong"))
        {
            // Variables
            int count = 0;

            // Get em/strong sub-nodes
            var subNodes = bold ? node.ChildNodes.Where(x => x.Name == "em") : node.ChildNodes.Where(x => x.Name == "strong");

            // Loop through subNodes
            foreach (var subNode in subNodes)
            {
                // Variables
                string contentBefore = string.Empty;
                string contentAfter = string.Empty;
                count++;

                // Add "bold only" or "italic only" content before "em" or "strong" tag (if there is any)  (only do this on first instance of "em" or "strong" because this will be part the "contentAfter" method below on any other instances)
                if (count == 1 && node.InnerHtml.Substring(0, node.InnerHtml.IndexOf(subNode.OuterHtml)).Length > 0)
                {
                    // Get "bold only" or "italic only" content before "em" or "strong" tag and add to result
                    contentBefore = node.InnerHtml.Substring(0, node.InnerHtml.IndexOf(subNode.OuterHtml));
                    result.Append(string.Format("<{1}>{0}</{1}>", contentBefore, bold ? "bold" : "italic"));
                }

                // Add "bolditalic" content to result
                result.Append(string.Format("<bolditalic>{0}</bolditalic>", subNode.InnerHtml));

                // Add "bold only" or "italic only" content after "em" or "strong" tag (if there is any)
                if (node.InnerHtml.Substring(subNode.StreamPosition).Length > subNode.OuterHtml.Length)
                {
                    // Get next instance of "em" or "strong"
                    int nextInstanceOfEmOrStrong = (node.OuterHtml.Substring(subNode.StreamPosition + subNode.OuterHtml.Length).Contains(bold ? "<em>" : "<strong>")) ? node.OuterHtml.Substring(subNode.StreamPosition + subNode.OuterHtml.Length).IndexOf(bold ? "<em>" : "<strong>") : -1;

                    // Get remaining content to the next instance of an "em" or "strong" tag (if there is another instance otherwise just get the remaining content)
                    if (nextInstanceOfEmOrStrong > 0)
                        // Get content between this "em" or "strong" tag and the next "em" or "strong" tag
                        contentAfter = node.OuterHtml.Substring(subNode.StreamPosition + subNode.OuterHtml.Length, nextInstanceOfEmOrStrong);
                    else
                        // Get remaining content but remove "</strong>" or "</em>" tag from the end
                        contentAfter = node.OuterHtml.Substring(subNode.StreamPosition + subNode.OuterHtml.Length, node.OuterHtml.Substring(subNode.StreamPosition + subNode.OuterHtml.Length).Length - (bold ? 9 : 5));

                    // Add "bold only" or "italic only" content to the end of the result
                    result.Append(string.Format("<{1}>{0}</{1}>", contentAfter, bold ? "bold" : "italic"));
                }
            }
        }
        else
            result.Append(string.Format("<{0}>{1}</{0}>", bold ? "bold" : "italic", node.InnerHtml));
    }
}

// Return results
return result.ToString();

0 个答案:

没有答案