Question

我有一个递归算法，我用它来迭代分层数据结构，但遗憾的是，对于一些数据，层次结构非常深，以至于我得到了StackOverflowError。我已经看到这种情况发生在大约150个节点的深度，而数据可能会增长到远远超过这个数字。对于上下文，此代码将在有限的环境中运行，并且更改JVM堆栈大小不是一个选项，数据结构是给定的，表示具有目录和文件的不同文件系统。

要解决堆栈溢出问题，我尝试将算法转换为迭代算法。这不是我之前必须做的事情，所以我从一些示例开始，通过简单的递归来说明如何执行此操作，但我不确定如何将其应用于循环内的递归。我找到了一种似乎有效的方法，但代码却相当疯狂。

以下是我原始递归方法的简化版本：

private CacheEntry sumUpAndCacheChildren(Node node) {
    final CacheEntry entry = getCacheEntry(node);

    if (entryIsValid(entry))
        return entry;

    Node[] children = node.listChildren();

    long size = 0;  

    if (children != null) {         
        for (Node child : children) {
            if (child.hasChildren()) {  
                size += sumUpAndCacheChildren(child).size;                  
            } else {                    
                size += child.size();
            }
        }                   
    }

    return putInCache(node, size);      
}

每个叶节点都有一个大小，而任何祖先节点的大小都被认为是其所有后代的大小。我想知道每个节点的大小，因此为每个节点聚合和缓存大小。

这是迭代版本：

private CacheEntry sumUpAndCacheChildren(Node initialNode) {
    class StackFrame {
        final Node node;
        Node[] children;

        // Local vars
        long size;

        // Tracking stack frame state
        int stage;
        int loopIndex;

        StackFrame(Node node) {
            this.node = node;
            this.children = null;
            this.size = 0;
            this.stage = 0;
            this.loopIndex = 0;
        }
    }

    final Stack<StackFrame> stack = new Stack<StackFrame>();
    stack.push(new StackFrame(initialNode));
    CacheEntry retValue = getCacheEntry(initialNode);

    outer:
    while (!stack.isEmpty()) {
        final StackFrame frame = stack.peek();
        final Node node = frame.node;

        switch(frame.stage) {
            case 0: {
                final CacheEntry entry = getCacheEntry(node);

                if (entryIsValid(entry)) {
                    retValue = entry;
                    stack.pop();
                    continue;       
                }

                frame.children = node.asItem().listChildren();
                frame.stage = frame.children != null ? 1 : 3;
            } break;
            case 1: {
                for (int i = frame.loopIndex; i < frame.children.length; ++i) {
                    frame.loopIndex = i;
                    final Node child = frame.children[i];

                    if (child.hasChildren()) {
                        stack.push(new StackFrame(child));
                        frame.stage = 2;    // Accumulate results once all the child stacks have been calculated.
                        frame.loopIndex++;  // Make sure we restart the for loop at the next iteration the next time around.
                        continue outer;
                    } else {
                        frame.size += child.size();
                    }
                }

                frame.stage = 3;
            } break;
            case 2: {
                // Accumulate results
                frame.size += retValue.size;
                frame.stage = 1;            // Continue the for loop
            } break;
            case 3: {
                retValue = putInCache(node, frame.type);
                stack.pop();
                continue;
            }
        }
    }

    return retValue;
}

这只是感觉比它需要的更疯狂，并且在代码中的所有地方执行此操作会很痛苦，我将它们递归到子项中并对它们执行不同的操作。当我在每个级别聚合并在for-loop中对孩子进行聚合时，我可以使用哪些技术来更容易地进行递归？

编辑：

借助以下答案，我能够大大简化事情。代码现在几乎与原始递归版本一样简洁。现在，我只需要在我在同一数据结构上递归的其他地方应用相同的原则。

Answer 1

由于您正在处理树结构并希望计算累积大小，因此在跟踪每个节点的父级时尝试DFS。我在这里假设您不能更改或子类Node并保留您使用的所有函数签名。

private class SizedNode {
    public long cumulativeSize;
    public Node node;
    public SizedNode parent;

    public SizedNode(SizedNode parent, Node node) {
        this.node = node;
        this.parent = parent;
    }

    public long getSize() {
        if (node.hasChildren()) {
            return cumulativeSize;
        }
        else {
            return node.size();
        }
    }
}

private void sumUpAndCacheChildren(Node start)
{
    Stack<SizedNode> nodeStack = new Stack<SizedNode>();

    // Let's start with the beginning node.
    nodeStack.push(new SizedNode(null, start));

    // Loop as long as we've got nodes to process
    while (!nodeStack.isEmpty()) {

        // Take a look at the top node
        SizedNode sizedNode = nodeStack.peek();            
        CacheEntry entry = getCacheEntry(sizedNode.node);

        if (entryIsValid(entry)) {
            // It's cached already, so we have computed its size
            nodeStack.pop();

            // Add the size to the parent, if applicable.
            if (sizedNode.parent != null) {
                sizedNode.parent.cumulativeSize += sizedNode.getSize();

                // If the parent's now the top guy, we're done with it so let's cache it
                if (sizedNode.parent == nodeStack.peek()) {
                    putInCache(sizedNode.parent.node, sizedNode.parent.getSize());
                }
            }
        }
        else {
            // Not cached.
            if (sizedNode.node.hasChildren()) {
                // It's got a bunch of children.
                // We can't compute the size yet, so just add the kids to the stack.
                Node[] children = sizedNode.node.listChildren();
                if (children != null) {
                    for (Node child : children) {
                        nodeStack.push(new SizedNode(sizedNode, child));
                    }    
                }                    
            }
            else {
                // It's a leaf node. Let's cache it.
                putInCache(sizedNode.node, sizedNode.node.size());
            }
        }
    }
}

Answer 2

你基本上是在进行N-ary树的后序迭代遍历;您可以尝试搜索更详细的示例。

非常粗糙的伪代码：

Node currentNode;
Stack<Node> pathToCurrent;
Stack<Integer> sizesInStack;
Stack<Integer> indexInNode;

pathToCurrent.push(rootNode);
sizesInStack.push(0);
indexInNode.push(0);

current = rootNode;
currentSize = 0;
currentIndex = 0;
while (current != null) {
  if (current.children != null && currentIndex < current.children.size) {
    //process the next node
    nextChild = current.children[currentIndex];
    pathToCurrent.push(current);
    sizesInStack.push(currentSize);
    indexInNode.push(currentIndex);
    current = nextChild;
    currentSize = 0;
    currentIndex = 0;
  } else {
    //this node is a leaf, or we've handled all its children 
    //put our size into the cache, then pop off the stack and set up for the next child of our parent
    currentSize += this.size();
    putInCache(this, currentSize);
    current = pathToCurrent.pop();  //If pop throws an exception on empty stack, handle it here and exit the loop
    currentSize = currentSize + sizesInStack.pop();
    currentIndex = 1 + indexInNode.pop();
  }
}

Answer 3

好的，我会用人类的话解释它，因为我现在不想编码：

获取最高级别的元素并写入列表
LOOP BEGIN
在此级别计算元素并将其添加到您的计数器
从当前列表中获取子项列表，单独存储
删除当前元素列表
将子列表写入当前元素列表
LOOP END

你只需要将一个布尔值放入loop-header并将其设置为false如果 children 的列表不再有任何元素...我希望我能够正确表达自己，感觉免费提问和/或询问澄清。

如果数据结构保持“折叠”，这个算法在每次迭代中会以指数方式变慢（ - > O（n²）），它的效率相当低，并且我很确定有人可以来优化 - 但它会比递归更快，并且不会产生堆栈溢出;但它可能会为非常大的数据集产生一个OutOfMemoryException - 但由于任何时候只有一个级别被迭代，所以......我认为这很不切实际

Answer 4

在调整@Marius对我的用例的回答后，我想出了这个：

class SizedNode {
    final Node node;
    final SizedNode parent;

    long size;
    boolean needsCaching;

    SizedNode(Node node, SizedNode parent) {
        this.parent = parent;
        this.node = node;
    }
}

private CacheEntry sumUpAndCacheChildren(Node start) {      
    final Stack<SizedNode> stack = new Stack<SizedNode>();
    stack.push(new SizedNode(start, null));
    CacheEntry returnValue = getCacheEntry(start);

    while (!stack.isEmpty()) {
        final SizedNode sizedNode = stack.pop();           
        final CacheEntry entry = getCacheEntry(sizedNode.folder);

        if (sizedNode.needsCaching) {
            // We finished processing all children, and now we're done with this node.
            if (sizedNode.parent != null) {
                sizedNode.parent.size += sizedNode.size;                
            }
            returnValue = putInCache(sizedNode.folder, sizedNode.size);
        } else if (entryIsValid(entry)) {
            if (sizedNode.parent != null) {
                sizedNode.parent.size += entry.size;
            }
            returnValue = entry;
        } else {                    
            // The next time we see this node again, it will be after we process all of its children.
            sizedNode.needsCaching = true;
            stack.push(sizedNode);

            for (Node child : sizedNode.node.listChildren()) {
                if (child.hasChildren()) {
                    stack.push(new SizedNode(child, sizedNode));                        
                } else {
                    sizedNode.size += child.size();
                }
            }               
        }
    }

    return returnValue;
}

这比我第一次传球时遇到的疯狂混乱要好得多。只是表明你真的必须考虑转换算法，这样它作为一种迭代方法也是有意义的。谢谢大家的帮助！

将递归方法（在循环内完成递归）转换为迭代方法

4 个答案: