Question

我有以下递归方法，该方法接受一个XHTML文档并根据特定条件标记节点，并且对于许多HTML内容调用如下所示： -

XmlDocument document = new XmlDocument();
document.LoadXml(xmlAsString);
PrepNodesForDeletion(document.DocumentElement, document.DocumentElement);

方法定义如下

/// <summary>
/// Recursive function to identify and mark all unnecessary nodes so that they can be removed from the document.
/// </summary>
/// <param name="nodeToCompareAgainst">The node that we are recursively comparing all of its descendant nodes against</param>
/// <param name="nodeInQuestion">The node whose children we are comparing against the "nodeToCompareAgainst" node</param>
static void PrepNodesForDeletion(XmlNode nodeToCompareAgainst, XmlNode nodeInQuestion)
{
    if (infinityIndex++ > 100000)
    {
        throw;
    }
    foreach (XmlNode childNode in nodeInQuestion.ChildNodes)
    {
        // make sure we compare all of the childNodes descendants to the nodeToCompareAgainst
        PrepNodesForDeletion(nodeToCompareAgainst, childNode);

        if (AreNamesSame(nodeToCompareAgainst, childNode) && AllAttributesPresent(nodeToCompareAgainst, childNode))
        {
            // the function AnyAttributesWithDifferingValues assumes that all attributes are present between the two nodes
            if (AnyAttributesWithDifferingValues(nodeToCompareAgainst, childNode) && InnerTextIsSame(nodeToCompareAgainst, childNode))
            {
                MarkNodeForDeletion(nodeToCompareAgainst);
            }
            else if (!AnyAttributesWithDifferingValues(nodeToCompareAgainst, childNode))
            {
                MarkNodeForDeletion(childNode);
            }
        }

        // make sure we compare all of the childNodes descendants to the childNode
        PrepNodesForDeletion(childNode, childNode);
    }
}

然后以下方法将删除标记的节点： -

static void RemoveMarkedNodes(XmlDocument document)
{
    // in order for us to make sure we remove everything we meant to remove, we need to do this in a while loop
    // for instance, if the original xml is = <a><a><b><a/></b></a><a/></a>
    // this should result in the xml being passed into this function as:
    // <a><b><a DeleteNode="TRUE" /></b><a DeleteNode="TRUE"><b><a DeleteNode="TRUE" /></b></a><a DeleteNode="TRUE" /></a>
    // then this function (without the while) will not delete the last <a/>, even though it is marked for deletion
    // if we incorporate a while loop, then we can insure all nodes marked for deletion are removed
    // TODO: understand the reason for this -- see http://groups.google.com/group/microsoft.public.dotnet.xml/browse_thread/thread/25df058a4efb5698/7dd0a8b71739216c?lnk=st&q=xmlnode+removechild+recursive&rnum=2&hl=en#7dd0a8b71739216c
    XmlNodeList nodesToDelete = document.SelectNodes("//*[@DeleteNode='TRUE']");

    while (nodesToDelete.Count > 0)
    {
        foreach (XmlNode nodeToDelete in nodesToDelete)
        {
            nodeToDelete.ParentNode.RemoveChild(nodeToDelete);
        }

        nodesToDelete = document.SelectNodes("//*[@DeleteNode='TRUE']");
    }
}

当我在没有infinityIndex计数器的情况下使用PrepNodesForDeletion方法时，我得到OutOfMemoryException几个HTML内容。但是，如果我使用infinityIndex计数器，它可能不会删除某些HTML内容的节点。

有人可以建议任何方法来删除递归。我也不熟悉HtmlAgility包。所以，如果可以使用它完成，有人可以提供一些代码示例。

Answer 1

好吧，如果我正确理解你的算法，你想要这样做：对于树中的每个节点，以非递归方式将其与其所有子节点进行比较，是否正确？

    // walk the tree in DFS
    public void XmlTreeWalk(XmlNode root, Action<XmlNode, XmlNode> action)
    {
        var nodesToCompare = new Stack<XmlNode>();
        foreach (XmlNode child in root.ChildNodes)
        {
            nodesToCompare.Push(child);
        }
        while (nodesToCompare.Count > 0)
        {
            var top = nodesToCompare.Pop();
            action(root, top);
            foreach (XmlNode child in top.ChildNodes)
            {
                nodesToCompare.Push(child);
            }
        }
    }

    // for each node: prepare all its children for deletion
    public void PrepareForDeletion(XmlNode root)
    {
        XmlTreeWalk(root, (r, c) => PrepareSubtreeForDeletion(r, c));
    }

    // for each node, compare all its children against the toCompare node
    private void PrepareSubtreeForDeletion(XmlNode toCompare, XmlNode root)
    {
        XmlTreeWalk(root, (unused, current) => MarkNodeForDeletion(toCompare, current));
    }

    // your delete logic
    public void MarkNodeForDeletion(XmlNode toCompare, XmlNode toCompareAgains)
    {
       ...
    }

这应该做的是：将树从上到下走，并为每个节点遍历该节点的子树，将所有子节点与该节点进行比较。

我没有对它进行过测试，因此它可能包含错误，但这个想法应该是明确的。显然这个算法是O（n ^ 2）。

Answer 2

要删除递归，孩子和父母必须彼此了解。

然后你可以从根父母那里沿着右腿向下走，直到你到达最右下腿。

然后从那里上升一个，然后向左下一个，然后向下直到底部。重复一个，向左，然后向右，等等，直到你完成整个树形结构。

我不确定您要尝试做什么，建议如何在您的问题上使用此方法。

Answer 3

你的问题是你的XML格式错误，直接导致你的DOM乱七八糟。我认为您将要做的是使用SAX解析器（必须存在于.net）并实现逻辑来自己修复DOM，这似乎是您尝试做的事情。

这种方法不是递归的，但是要求你做一些你没有意识到你需要做的工作。

另请注意，您正在获得内存不足异常而不是堆栈溢出异常，这加强了过多递归本身不是您的问题的想法。

C＃ - XML - 在不使用递归的情况下删除节点

3 个答案: