Question

我有一个简单的递归函数，可构造一定深度的二叉树。

我认为带有DFS堆栈的迭代版本将实现类似的性能，但令人惊讶的是它慢了3倍！

更准确地说，在我的计算机上，深度为15的递归版本需要约330_000 ns，而具有堆栈的迭代版本需要约950_000 ns。

令人惊讶的性能是否可以归因于优越的缓存局部性（对于递归函数显然应该更好）。

我用于性能基准测试的代码：

class Main {
    public static void main(String[] args) {
        long startTime = System.nanoTime();
        long runs;
        Tree t = null;
        for(runs=0; (System.nanoTime() - startTime)< 3_000_000_000L ; runs++) {
            t = createTree3(15);
        }
        System.out.println((System.nanoTime() - startTime) / runs + " ns/call");
    }

    static Tree createTree(int depth) {
        Tree t = new Tree();
        createTreeHlp(t, depth);
        return t;
    }

    static void createTreeHlp(Tree tree, int depth) {
        if (depth == 0)
            tree.init(0, null, null);
        else {
            tree.init(depth, new Tree(), new Tree());
            createTreeHlp(tree.leftChild, depth -1);
            createTreeHlp(tree.rghtChild, depth -1);
        }
    }


    static Tree createTree3(int depth_) {
        TreeStack stack = new TreeStack();
        Tree result = new Tree();
        stack.put(result, depth_);
        while (!stack.isEmpty()) {
            int depth = stack.depth[stack.stack][stack.index];
            Tree tree = stack.tree[stack.stack][stack.index];
            stack.dec();
            if (depth == 0)
                tree.init(0, null, null);
            else {
                tree.init(depth, new Tree(), new Tree());
                stack.put(tree.leftChild, depth -1);
                stack.put(tree.rghtChild, depth -1);
            }
        }
        return result;
    }
}

class Tree {
    int payload;
    Tree leftChild;
    Tree rghtChild;

    public Tree init(int payload, Tree leftChild, Tree rghtChild) {
        this.leftChild = leftChild;
        this.rghtChild = rghtChild;
        this.payload = payload;
        return this;
    }

    @Override
    public String toString() {
        return "Tree(" +payload+", "+ leftChild + ", " + rghtChild + ")";
    }
}
class TreeStack {

    Tree[][] tree;
    int[][] depth;

    int stack =  1;
    int index = -1;

    TreeStack() {
        this.tree = new Tree[100][];
        this.depth = new int[100][];

        alloc(100_000);
        --stack;
        alloc(0);
    }

    boolean isEmpty() {
        return index == -1;
    }

    void alloc(int size) {
        tree[stack] = new Tree[size];
        depth[stack] = new int[size];
    }

    void inc() {
        if (tree[stack].length == ++index) {
            if (tree[++stack] == null) alloc(2 * index);
            index = 0;
        }
    }
    void dec() {
        if (--index == -1)
            index = tree[--stack].length - 1;
    }

    void put(Tree tree, int depth) {
        inc();
        this.tree[stack][index] = tree;
        this.depth[stack][index] = depth;
    }
}

Answer 1

简短的回答：因为您是这样编码的。

长答案：您创建一个堆栈，将其放入其中，从中获取内容，并且做起来非常复杂。让我们为这种情况简单地做。您想要一棵一定深度的树，其中填充了所有子树，值是深度，并且您想要最深的层次。这是一种简单的方法：

static Tree createTree3(int depth_) {
    Tree[] arr = new Tree[1 << depth_];

    int count = 1 << depth_;
    for (int i=0; i<count; i++)
        arr[i] = new Tree().init(0, null, null);

    int d = 1;
    count >>= 1;
    while (count > 0)
    {
        for (int i=0; i<count; i++)
        {
            Tree t = new Tree().init(d, arr[i * 2], arr[i * 2 + 1]);
            arr[i] = t;
        }
        count >>= 1;
        d++;
    }

  return arr[0];
}

首先执行的操作将创建最低级别的节点，其中的节点深度为2 ^。然后，它创建下一级节点并添加子级。然后是下一个。没有堆栈，没有递归，只是简单的循环。

我通过将两次20000次运行到深度14来进行基准测试，因此不需要花费时间或任何东西，而只是创建树。我的i7笔记本电脑上的结果：

您的递归大约需要187µs /棵树
我的迭代大约需要177µs /棵

如果我运行的深度是15，则是311对340。

时间在改变，因为它不是在检查CPU时间而是在检查系统时间，这取决于JITter是否以其他方式执行操作等等。

但是总之，在这种情况下，即使进行了这种简单的更改，也可以轻松地使迭代与递归一样快，而且我敢肯定还有更聪明的方法。

为什么此递归函数比迭代函数快3倍？

1 个答案: