我有以下递归方法,该方法接受一个XHTML文档并根据特定条件标记节点,并且对于许多HTML内容调用如下所示: -
XmlDocument document = new XmlDocument();
document.LoadXml(xmlAsString);
PrepNodesForDeletion(document.DocumentElement, document.DocumentElement);
方法定义如下
/// <summary>
/// Recursive function to identify and mark all unnecessary nodes so that they can be removed from the document.
/// </summary>
/// <param name="nodeToCompareAgainst">The node that we are recursively comparing all of its descendant nodes against</param>
/// <param name="nodeInQuestion">The node whose children we are comparing against the "nodeToCompareAgainst" node</param>
static void PrepNodesForDeletion(XmlNode nodeToCompareAgainst, XmlNode nodeInQuestion)
{
if (infinityIndex++ > 100000)
{
throw;
}
foreach (XmlNode childNode in nodeInQuestion.ChildNodes)
{
// make sure we compare all of the childNodes descendants to the nodeToCompareAgainst
PrepNodesForDeletion(nodeToCompareAgainst, childNode);
if (AreNamesSame(nodeToCompareAgainst, childNode) && AllAttributesPresent(nodeToCompareAgainst, childNode))
{
// the function AnyAttributesWithDifferingValues assumes that all attributes are present between the two nodes
if (AnyAttributesWithDifferingValues(nodeToCompareAgainst, childNode) && InnerTextIsSame(nodeToCompareAgainst, childNode))
{
MarkNodeForDeletion(nodeToCompareAgainst);
}
else if (!AnyAttributesWithDifferingValues(nodeToCompareAgainst, childNode))
{
MarkNodeForDeletion(childNode);
}
}
// make sure we compare all of the childNodes descendants to the childNode
PrepNodesForDeletion(childNode, childNode);
}
}
然后以下方法将删除标记的节点: -
static void RemoveMarkedNodes(XmlDocument document)
{
// in order for us to make sure we remove everything we meant to remove, we need to do this in a while loop
// for instance, if the original xml is = <a><a><b><a/></b></a><a/></a>
// this should result in the xml being passed into this function as:
// <a><b><a DeleteNode="TRUE" /></b><a DeleteNode="TRUE"><b><a DeleteNode="TRUE" /></b></a><a DeleteNode="TRUE" /></a>
// then this function (without the while) will not delete the last <a/>, even though it is marked for deletion
// if we incorporate a while loop, then we can insure all nodes marked for deletion are removed
// TODO: understand the reason for this -- see http://groups.google.com/group/microsoft.public.dotnet.xml/browse_thread/thread/25df058a4efb5698/7dd0a8b71739216c?lnk=st&q=xmlnode+removechild+recursive&rnum=2&hl=en#7dd0a8b71739216c
XmlNodeList nodesToDelete = document.SelectNodes("//*[@DeleteNode='TRUE']");
while (nodesToDelete.Count > 0)
{
foreach (XmlNode nodeToDelete in nodesToDelete)
{
nodeToDelete.ParentNode.RemoveChild(nodeToDelete);
}
nodesToDelete = document.SelectNodes("//*[@DeleteNode='TRUE']");
}
}
当我在没有infinityIndex计数器的情况下使用PrepNodesForDeletion
方法时,我得到OutOfMemoryException
几个HTML内容。但是,如果我使用infinityIndex计数器,它可能不会删除某些HTML内容的节点。
有人可以建议任何方法来删除递归。我也不熟悉HtmlAgility包。所以,如果可以使用它完成,有人可以提供一些代码示例。
答案 0 :(得分:1)
好吧,如果我正确理解你的算法,你想要这样做: 对于树中的每个节点,以非递归方式将其与其所有子节点进行比较,是否正确?
// walk the tree in DFS
public void XmlTreeWalk(XmlNode root, Action<XmlNode, XmlNode> action)
{
var nodesToCompare = new Stack<XmlNode>();
foreach (XmlNode child in root.ChildNodes)
{
nodesToCompare.Push(child);
}
while (nodesToCompare.Count > 0)
{
var top = nodesToCompare.Pop();
action(root, top);
foreach (XmlNode child in top.ChildNodes)
{
nodesToCompare.Push(child);
}
}
}
// for each node: prepare all its children for deletion
public void PrepareForDeletion(XmlNode root)
{
XmlTreeWalk(root, (r, c) => PrepareSubtreeForDeletion(r, c));
}
// for each node, compare all its children against the toCompare node
private void PrepareSubtreeForDeletion(XmlNode toCompare, XmlNode root)
{
XmlTreeWalk(root, (unused, current) => MarkNodeForDeletion(toCompare, current));
}
// your delete logic
public void MarkNodeForDeletion(XmlNode toCompare, XmlNode toCompareAgains)
{
...
}
这应该做的是:将树从上到下走,并为每个节点遍历该节点的子树,将所有子节点与该节点进行比较。
我没有对它进行过测试,因此它可能包含错误,但这个想法应该是明确的。显然这个算法是O(n ^ 2)。
答案 1 :(得分:0)
要删除递归,孩子和父母必须彼此了解。
然后你可以从根父母那里沿着右腿向下走,直到你到达最右下腿。
然后从那里上升一个,然后向左下一个,然后向下直到底部。重复一个,向左,然后向右,等等,直到你完成整个树形结构。
我不确定您要尝试做什么,建议如何在您的问题上使用此方法。
答案 2 :(得分:0)
你的问题是你的XML格式错误,直接导致你的DOM乱七八糟。我认为您将要做的是使用SAX解析器(必须存在于.net)并实现逻辑来自己修复DOM,这似乎是您尝试做的事情。
这种方法不是递归的,但是要求你做一些你没有意识到你需要做的工作。
另请注意,您正在获得内存不足异常而不是堆栈溢出异常,这加强了过多递归本身不是您的问题的想法。