我有一个递归算法,我用它来迭代分层数据结构,但遗憾的是,对于一些数据,层次结构非常深,以至于我得到了StackOverflowError。我已经看到这种情况发生在大约150个节点的深度,而数据可能会增长到远远超过这个数字。对于上下文,此代码将在有限的环境中运行,并且更改JVM堆栈大小不是一个选项,数据结构是给定的,表示具有目录和文件的不同文件系统。
要解决堆栈溢出问题,我尝试将算法转换为迭代算法。这不是我之前必须做的事情,所以我从一些示例开始,通过简单的递归来说明如何执行此操作,但我不确定如何将其应用于循环内的递归。我找到了一种似乎有效的方法,但代码却相当疯狂。
以下是我原始递归方法的简化版本:
private CacheEntry sumUpAndCacheChildren(Node node) {
final CacheEntry entry = getCacheEntry(node);
if (entryIsValid(entry))
return entry;
Node[] children = node.listChildren();
long size = 0;
if (children != null) {
for (Node child : children) {
if (child.hasChildren()) {
size += sumUpAndCacheChildren(child).size;
} else {
size += child.size();
}
}
}
return putInCache(node, size);
}
每个叶节点都有一个大小,而任何祖先节点的大小都被认为是其所有后代的大小。我想知道每个节点的大小,因此为每个节点聚合和缓存大小。
这是迭代版本:
private CacheEntry sumUpAndCacheChildren(Node initialNode) {
class StackFrame {
final Node node;
Node[] children;
// Local vars
long size;
// Tracking stack frame state
int stage;
int loopIndex;
StackFrame(Node node) {
this.node = node;
this.children = null;
this.size = 0;
this.stage = 0;
this.loopIndex = 0;
}
}
final Stack<StackFrame> stack = new Stack<StackFrame>();
stack.push(new StackFrame(initialNode));
CacheEntry retValue = getCacheEntry(initialNode);
outer:
while (!stack.isEmpty()) {
final StackFrame frame = stack.peek();
final Node node = frame.node;
switch(frame.stage) {
case 0: {
final CacheEntry entry = getCacheEntry(node);
if (entryIsValid(entry)) {
retValue = entry;
stack.pop();
continue;
}
frame.children = node.asItem().listChildren();
frame.stage = frame.children != null ? 1 : 3;
} break;
case 1: {
for (int i = frame.loopIndex; i < frame.children.length; ++i) {
frame.loopIndex = i;
final Node child = frame.children[i];
if (child.hasChildren()) {
stack.push(new StackFrame(child));
frame.stage = 2; // Accumulate results once all the child stacks have been calculated.
frame.loopIndex++; // Make sure we restart the for loop at the next iteration the next time around.
continue outer;
} else {
frame.size += child.size();
}
}
frame.stage = 3;
} break;
case 2: {
// Accumulate results
frame.size += retValue.size;
frame.stage = 1; // Continue the for loop
} break;
case 3: {
retValue = putInCache(node, frame.type);
stack.pop();
continue;
}
}
}
return retValue;
}
这只是感觉比它需要的更疯狂,并且在代码中的所有地方执行此操作会很痛苦,我将它们递归到子项中并对它们执行不同的操作。当我在每个级别聚合并在for-loop中对孩子进行聚合时,我可以使用哪些技术来更容易地进行递归?
编辑:
借助以下答案,我能够大大简化事情。代码现在几乎与原始递归版本一样简洁。现在,我只需要在我在同一数据结构上递归的其他地方应用相同的原则。
答案 0 :(得分:1)
由于您正在处理树结构并希望计算累积大小,因此在跟踪每个节点的父级时尝试DFS。我在这里假设您不能更改或子类Node
并保留您使用的所有函数签名。
private class SizedNode {
public long cumulativeSize;
public Node node;
public SizedNode parent;
public SizedNode(SizedNode parent, Node node) {
this.node = node;
this.parent = parent;
}
public long getSize() {
if (node.hasChildren()) {
return cumulativeSize;
}
else {
return node.size();
}
}
}
private void sumUpAndCacheChildren(Node start)
{
Stack<SizedNode> nodeStack = new Stack<SizedNode>();
// Let's start with the beginning node.
nodeStack.push(new SizedNode(null, start));
// Loop as long as we've got nodes to process
while (!nodeStack.isEmpty()) {
// Take a look at the top node
SizedNode sizedNode = nodeStack.peek();
CacheEntry entry = getCacheEntry(sizedNode.node);
if (entryIsValid(entry)) {
// It's cached already, so we have computed its size
nodeStack.pop();
// Add the size to the parent, if applicable.
if (sizedNode.parent != null) {
sizedNode.parent.cumulativeSize += sizedNode.getSize();
// If the parent's now the top guy, we're done with it so let's cache it
if (sizedNode.parent == nodeStack.peek()) {
putInCache(sizedNode.parent.node, sizedNode.parent.getSize());
}
}
}
else {
// Not cached.
if (sizedNode.node.hasChildren()) {
// It's got a bunch of children.
// We can't compute the size yet, so just add the kids to the stack.
Node[] children = sizedNode.node.listChildren();
if (children != null) {
for (Node child : children) {
nodeStack.push(new SizedNode(sizedNode, child));
}
}
}
else {
// It's a leaf node. Let's cache it.
putInCache(sizedNode.node, sizedNode.node.size());
}
}
}
}
答案 1 :(得分:1)
你基本上是在进行N-ary树的后序迭代遍历;您可以尝试搜索更详细的示例。
非常粗糙的伪代码:
Node currentNode;
Stack<Node> pathToCurrent;
Stack<Integer> sizesInStack;
Stack<Integer> indexInNode;
pathToCurrent.push(rootNode);
sizesInStack.push(0);
indexInNode.push(0);
current = rootNode;
currentSize = 0;
currentIndex = 0;
while (current != null) {
if (current.children != null && currentIndex < current.children.size) {
//process the next node
nextChild = current.children[currentIndex];
pathToCurrent.push(current);
sizesInStack.push(currentSize);
indexInNode.push(currentIndex);
current = nextChild;
currentSize = 0;
currentIndex = 0;
} else {
//this node is a leaf, or we've handled all its children
//put our size into the cache, then pop off the stack and set up for the next child of our parent
currentSize += this.size();
putInCache(this, currentSize);
current = pathToCurrent.pop(); //If pop throws an exception on empty stack, handle it here and exit the loop
currentSize = currentSize + sizesInStack.pop();
currentIndex = 1 + indexInNode.pop();
}
}
答案 2 :(得分:0)
好的,我会用人类的话解释它,因为我现在不想编码:
你只需要将一个布尔值放入loop-header并将其设置为false如果 children 的列表不再有任何元素...我希望我能够正确表达自己,感觉免费提问和/或询问澄清。
如果数据结构保持“折叠”,这个算法在每次迭代中会以指数方式变慢( - > O(n²)),它的效率相当低,并且我很确定有人可以来优化 - 但它会比递归更快,并且不会产生堆栈溢出;但它可能会为非常大的数据集产生一个OutOfMemoryException - 但由于任何时候只有一个级别被迭代,所以......我认为这很不切实际
答案 3 :(得分:0)
在调整@Marius对我的用例的回答后,我想出了这个:
class SizedNode {
final Node node;
final SizedNode parent;
long size;
boolean needsCaching;
SizedNode(Node node, SizedNode parent) {
this.parent = parent;
this.node = node;
}
}
private CacheEntry sumUpAndCacheChildren(Node start) {
final Stack<SizedNode> stack = new Stack<SizedNode>();
stack.push(new SizedNode(start, null));
CacheEntry returnValue = getCacheEntry(start);
while (!stack.isEmpty()) {
final SizedNode sizedNode = stack.pop();
final CacheEntry entry = getCacheEntry(sizedNode.folder);
if (sizedNode.needsCaching) {
// We finished processing all children, and now we're done with this node.
if (sizedNode.parent != null) {
sizedNode.parent.size += sizedNode.size;
}
returnValue = putInCache(sizedNode.folder, sizedNode.size);
} else if (entryIsValid(entry)) {
if (sizedNode.parent != null) {
sizedNode.parent.size += entry.size;
}
returnValue = entry;
} else {
// The next time we see this node again, it will be after we process all of its children.
sizedNode.needsCaching = true;
stack.push(sizedNode);
for (Node child : sizedNode.node.listChildren()) {
if (child.hasChildren()) {
stack.push(new SizedNode(child, sizedNode));
} else {
sizedNode.size += child.size();
}
}
}
}
return returnValue;
}
这比我第一次传球时遇到的疯狂混乱要好得多。只是表明你真的必须考虑转换算法,这样它作为一种迭代方法也是有意义的。谢谢大家的帮助!