Question

作为实验，我写了一个最小的有点懒惰的序列类（int）GarbageTest.java，以查看我是否可以用Java来处理很长的懒惰序列，就像在Clojure中一样

给出一个naturals()方法，该方法返回惰性的，无限的自然数序列；一个drop(n,sequence)方法，该方法删除n的前sequence个元素并返回其余sequence；还有一个返回简单的nth(n,sequence)方法：drop(n, lazySeq).head()，我编写了两个测试：

static int N = (int)1e6;

// succeeds @ N = (int)1e8 with java -Xmx10m
@Test
public void dropTest() {
    assertThat( drop(N, naturals()).head(), is(N+1));
}

// fails with OutOfMemoryError @ N = (int)1e6 with java -Xmx10m
@Test
public void nthTest() {
    assertThat( nth(N, naturals()), is(N+1));
}

请注意，dropTest()的主体是通过复制nthTest()的主体，然后在nth(N, naturals())调用中调用IntelliJ的“内联”重构而生成的。因此在我看来dropTest()的行为应与nthTest()的行为相同。

但这并不完全相同！ dropTest()的N值直到1e8才能完成，而nthTest()的N值最小为1e6时，OutOfMemoryError失败。

我避免了内部类。而且我已经试验了我的代码ClearingArgsGarbageTest.java的变体，该变体在调用其他方法之前将方法参数为空。我已经应用了YourKit分析器。我看过字节码。我只是找不到导致nthTest()失败的泄漏。

“泄漏”在哪里？为什么nthTest()泄漏而dropTest()没有泄漏？

如果您不想单击进入Github项目，这里是GarbageTest.java中的其余代码：

/**
 * a not-perfectly-lazy lazy sequence of ints. see LazierGarbageTest for a lazier one
 */
static class LazyishSeq {
    final int head;

    volatile Supplier<LazyishSeq> tailThunk;
    LazyishSeq tailValue;

    LazyishSeq(final int head, final Supplier<LazyishSeq> tailThunk) {
        this.head = head;
        this.tailThunk = tailThunk;
        tailValue = null;
    }

    int head() {
        return head;
    }

    LazyishSeq tail() {
        if (null != tailThunk)
            synchronized(this) {
                if (null != tailThunk) {
                    tailValue = tailThunk.get();
                    tailThunk = null;
                }
            }
        return tailValue;
    }
}

static class Incrementing implements Supplier<LazyishSeq> {
    final int seed;
    private Incrementing(final int seed) { this.seed = seed;}

    public static LazyishSeq createSequence(final int n) {
        return new LazyishSeq( n, new Incrementing(n+1));
    }

    @Override
    public LazyishSeq get() {
        return createSequence(seed);
    }
}

static LazyishSeq naturals() {
    return Incrementing.createSequence(1);
}

static LazyishSeq drop(
        final int n,
        final LazyishSeq lazySeqArg) {
    LazyishSeq lazySeq = lazySeqArg;
    for( int i = n; i > 0 && null != lazySeq; i -= 1) {
        lazySeq = lazySeq.tail();
    }
    return lazySeq;
}

static int nth(final int n, final LazyishSeq lazySeq) {
    return drop(n, lazySeq).head();
}

Answer 1

使用您的方法

static int nth(final int n, final LazyishSeq lazySeq) {
    return drop(n, lazySeq).head();
}

在整个lazySeq操作期间，参数变量drop保留对序列中第一个元素的引用。这样可以防止整个序列被垃圾收集。

相反，

public void dropTest() {
    assertThat( drop(N, naturals()).head(), is(N+1));
}

序列的第一个元素由naturals()返回并直接传递给drop的调用，因此从操作数堆栈中删除，并且在执行drop时不存在。

您尝试将参数变量设置为null，即

static int nth(final int n, /*final*/ LazyishSeq lazySeqArg) {
    final LazyishSeq lazySeqLocal = lazySeqArg;
    lazySeqArg = null;
    return drop(n,lazySeqLocal).head();
}

无济于事，就像现在一样，lazySeqArg变量是null，但是lazySeqLocal保留了对第一个元素的引用。

局部变量通常不会阻止垃圾回收，否则允许未使用的对象的收集，但这并不意味着特定的实现可以做到这一点。

对于HotSpot JVM，只有经过优化的代码才能摆脱此类未使用的引用。但是在这里，nth并不是热点，因为繁重的事情发生在drop方法中。

这是即使在drop方法中也没有出现相同问题的原因，尽管它在其参数变量中也保留了对第一个元素的引用。 drop方法包含执行实际工作的循环，因此，很可能会被JVM优化，这可能会导致JVM消除未使用的变量，从而收集序列中已处理的部分。

有很多因素可能会影响JVM的优化。除了代码的不同形状外，似乎在未优化阶段快速分配内存也可能会减少优化器的改进。确实，当我使用-Xcompile运行时，为了完全禁止解释执行，两个变体都可以成功运行，即使int N = (int)1e9也不再是问题。当然，强制编译会增加启动时间。

我不得不承认，我不明白为什么混合模式会导致那更加糟糕，因此我将作进一步调查。但是通常，您必须意识到垃圾收集器的效率取决于实现，因此在一个环境中收集的对象可能会留在另一个环境中的内存中。

Answer 2

Clojure实施了一种用于处理这种情况的策略，称之为“本地清理”。编译器中对它的支持使其可以在纯Clojure代码中所需的地方自动启动（除非在编译时被禁用-这有时对于调试很有用）。 Clojure的确在Java运行时中的各个地方也清除了本地语言，尽管这样做无疑会很麻烦，但它可以在Java库甚至应用程序代码中使用的方式。

在介绍Clojure之前，这里是此示例中发生的情况的简短摘要：

nth(int, LazyishSeq)是根据drop(int, LazyishSeq)和LazyishSeq.head()实现的。
nth将其两个参数都传递给drop，并且不再对其使用。
drop可以很容易地实现，从而避免握住传入序列的开头。

这里nth仍然保持其序列参数的开头。运行时可能会丢弃该引用，但不能保证会。

Clojure处理此问题的方法是在将控制权移交给drop之前，明确清除对序列的引用。这是通过一个相当巧妙的技巧（link to the below snippet on GitHub as of Clojure 1.9.0）完成的：

//  clojure/src/jvm/clojure/lang/Util.java

/**
 *   Copyright (c) Rich Hickey. All rights reserved.
 *   The use and distribution terms for this software are covered by the
 *   Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
 *   which can be found in the file epl-v10.html at the root of this distribution.
 *   By using this software in any fashion, you are agreeing to be bound by
 *   the terms of this license.
 *   You must not remove this notice, or any other, from this software.
 **/

// … beginning of the file omitted …

// the next line is the 190th in the file as of Clojure 1.9.0
static public Object ret1(Object ret, Object nil){
        return ret;
}

static public ISeq ret1(ISeq ret, Object nil){
        return ret;
}

// …

鉴于上述情况，可以将在drop内部对nth的呼叫更改为

drop(n, ret1(lazySeq, lazySeq = null))

这里lazySeq = null在控制权转移到ret1之前作为表达式求值；值是null，并且将lazySeq引用设置为null也会带来副作用。至此，ret1的第一个参数将被评估，因此，ret1在其第一个参数中接收对该序列的引用，并按预期方式将其返回，然后将该值传递给{{ 1}}。

因此drop收到drop本地保存的原始值，但是在控制权转移到lazySeq之前清除了本地本身。

因此，drop不再停留在序列的开头。

为什么该Java方法会泄漏—为何内联它可以解决泄漏？

2 个答案: