清理与Object关联的外部资源的可靠方法

时间:2012-09-12 17:31:14

标签: java garbage-collection

具体用例:二进制数据有一个抽象,广泛用于处理任意大小的二进制blob。由于抽象是在没有关于外部 VM的情况下创建的,因此现有的实现依赖于垃圾收集器的生命周期。

现在我想添加一个使用堆外存储的新实现(例如在临时文件中)。由于存在大量使用抽象的现有代码,因此引入用于显式生命周期管理的其他方法是不切实际的,我不能使用每个客户端用例来重写以确保它们管理新的生命周期要求。

我可以想到两种解决方案,但无法确定哪种方法更好:

a。)使用finalize()来管理相关资源的生命周期(例如,在finalize中删除临时文件。这个似乎实现起来非常简单。

b。)使用引用队列和java.lang.Reference(但是哪一个,弱或幻像?)和一些额外的对象,在引用入队时删除文件。这似乎需要更多的工作来实现,我需要不仅创建新的实现,而是分离其清理数据确保清理对象不能在对象之前进行GC已暴露给用户。

c。)我还没有其他一些方法吗?

我应该采取哪种方法(为什么我更喜欢它)?实施提示也很受欢迎。


编辑:所需的可靠性程度 - 对于我的目的,如果在VM突然终止时临时文件被清理,则完全正常。主要关注的是,当VM运行时,它可以很好地填充本地磁盘(在几天的时间内)与临时文件(这在我身上发生了真正的apache TIKA,它在提取文本时创建了临时文件从某些文档类型来看,zip文件是我认为的罪魁祸首。我在机器上安排了定期清理工作,因此如果文件因清理而下降,则并不意味着世界末日 - 只要它在短时间内不会定期发生。

据我所知,finalize()适用于Oracale JRE。如果我正确地解释了javadocs,那么引用必须按照文档的方式工作(在抛出OutOfMemoryError之前,不能清除只有软/弱可达的引用对象)。这意味着当VM可能决定不长时间回收特定对象时,它必须在堆满时最新。反过来,这意味着堆上只存在有限数量的基于文件的blob。 VM必须在某些时候清理它们,否则它将完全耗尽内存。或者是否有任何漏洞允许VM运行OOM而不清除引用(假设它们不再被严格引用)?


Edit2:据我所知,此时finalize()和Reference应该足够可靠,但我收集Reference可能是更好的解决方案,因为它与GC的交互无法恢复死对象和因此它的性能影响应该更小?


Edit3:依赖于VM终止或启动(关闭挂钩或类似)的解决方案方法对我没用,因为通常VM会运行很长一段时间(服务器环境)。

4 个答案:

答案 0 :(得分:3)

以下是 Effective Java 的相关项目:Avoid finalizers

在该项目中包含的建议是做@delnan在评论中建议的内容:提供明确的终止方法。还提供了大量示例:InputStream.close()Graphics.dispose()等。了解奶牛可能已经离开谷仓...

无论如何,这里有一个草图,说明如何用参考对象完成这项工作。首先,二进制数据的接口:

import java.io.IOException;

public interface Blob {
    public byte[] read() throws IOException;
    public void update(byte[] data) throws IOException;
}

接下来,基于文件的实现:

import java.io.File;
import java.io.IOException;

public class FileBlob implements Blob {

    private final File file;

    public FileBlob(File file) {
        super();
        this.file = file;
    }

    @Override
    public byte[] read() throws IOException {
        throw new UnsupportedOperationException();
    }

    @Override
    public void update(byte[] data) throws IOException {
        throw new UnsupportedOperationException();
    }
}

然后,工厂创建并跟踪基于文件的blob:

import java.io.File;
import java.io.IOException;
import java.lang.ref.PhantomReference;
import java.lang.ref.Reference;
import java.lang.ref.ReferenceQueue;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

public class FileBlobFactory {

    private static final long TIMER_PERIOD_MS = 10000;

    private final ReferenceQueue<File> queue;
    private final ConcurrentMap<PhantomReference<File>, String> refs;
    private final Timer reaperTimer;

    public FileBlobFactory() {
        super();
        this.queue = new ReferenceQueue<File>();
        this.refs = new ConcurrentHashMap<PhantomReference<File>, String>();
        this.reaperTimer = new Timer("FileBlob reaper timer", true);
        this.reaperTimer.scheduleAtFixedRate(new FileBlobReaper(), TIMER_PERIOD_MS, TIMER_PERIOD_MS);
    }

    public Blob create() throws IOException {
        File blobFile = File.createTempFile("blob", null);
        //blobFile.deleteOnExit();
        String blobFilePath = blobFile.getCanonicalPath();
        FileBlob blob = new FileBlob(blobFile);
        this.refs.put(new PhantomReference<File>(blobFile, this.queue), blobFilePath);
        return blob;
    }

    public void shutdown() {
        this.reaperTimer.cancel();
    }

    private class FileBlobReaper extends TimerTask {
        @Override
        public void run() {
            System.out.println("FileBlob reaper task begin");
            Reference<? extends File> ref = FileBlobFactory.this.queue.poll();
            while (ref != null) {
                String blobFilePath = FileBlobFactory.this.refs.remove(ref);
                File blobFile = new File(blobFilePath);
                boolean isDeleted = blobFile.delete();
                System.out.println("FileBlob reaper deleted " + blobFile + ": " + isDeleted);
                ref = FileBlobFactory.this.queue.poll();
            }
            System.out.println("FileBlob reaper task end");
        }
    }
}

最后,一项包含一些人工GC“压力”的测试可以让事情顺利进行:

import java.io.IOException;

public class FileBlobTest {

    public static void main(String[] args) {
        FileBlobFactory factory = new FileBlobFactory();
        for (int i = 0; i < 10; i++) {
            try {
                factory.create();
            } catch (IOException exc) {
                exc.printStackTrace();
            }
        }

        while(true) {
            try {
                Thread.sleep(5000);
                System.gc(); System.gc(); System.gc();
            } catch (InterruptedException exc) {
                exc.printStackTrace();
                System.exit(1);
            }
        }
    }
}

哪个应该产生一些输出,如:

FileBlob reaper task begin
FileBlob reaper deleted C:\WINDOWS\Temp\blob1055430495823649476.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob873625122345395275.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob4123088770942737465.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob1631534546278785404.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob6150533076250997032.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob7075872276085608840.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob5998579368597938203.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob3779536278201681316.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob8720399798060613253.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob3046359448721598425.tmp: true
FileBlob reaper task end

答案 1 :(得分:1)

这是我在基于kschneids参考的示例之后编写的解决方案(以防万一有人需要通用的实现)。它的文档记录应该易于理解/适应:

import java.lang.ref.PhantomReference;
import java.lang.ref.Reference;
import java.lang.ref.ReferenceQueue;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

/**
 * Helper class for cleaning up resources when an object is
 * garbage collected. Use as follows (both anonymous subclass or
 * public subclass are fine. Be extra careful to not retain
 * a reference to the trigger!):
 * 
 * new ResourceFinalizer(trigger) {
 * 
 *     // put user defined state relevant for cleanup here
 *     
 *     protected void cleanup() {
 *         // implement cleanup procedure.
 *     }
 * }
 *
 * Typical application is closing of native resources when an object
 * is garbage collected (e.g. VM external resources).
 * 
 * You must not retain any references from the ResourceFinalizer to the
 * trigger (otherwise the trigger can never become eligible for GC).
 * You can however retain references to the ResourceFinalizer from the
 * trigger, so you can access the data relevant for the finalizer
 * from the trigger (no need to duplicate the data).
 * There is no need to explicitly reference the finalizer after it has
 * been created, the finalizer base class will ensure the finalizer
 * itself is not eligible for GC until it has been run.
 * 
 * When the VM terminates, ResourceFinalizer that haven't been
 * triggered will run, regardless of the state of their triggers
 * (that is even if the triggers are still reachable, the finalizer
 * will be called). There are no guarantees on this, if the VM
 * is terminated abruptly this step may not take place.
 */
public abstract class ResourceFinalizer {

    /**
     * Constructs a ResourceFinalizer that is triggered when the
     * object referenced by finalizationTrigger is garbage collected.
     * 
     * To make this work, you must ensure there are no references to
     * the finalizationTrigger object from the ResourceFinalizer.
     */
    protected ResourceFinalizer(final Object trigger) {
        // create reference to trigger and register this finalizer
        final Reference<Object> reference = new PhantomReference<Object>(trigger, referenceQueue);
        synchronized (finalizerMap) {
            finalizerMap.put(reference, this);
        }
    }

    /**
     * The cleanup() method is called when the finalizationTrigger
     * has been garbage collected.
     */
    protected abstract void cleanup();

    // --------------------------------------------------------------
    // ---
    // --- Background finalization management
    // ---
    // --------------------------------------------------------------

    /**
     * The reference queue used to interact with the garbage collector.
     */
    private final static ReferenceQueue<Object> referenceQueue = new ReferenceQueue<Object>();

    /**
     * Global static map of finalizers. Enqueued references are used as key
     * to find the finalizer for the referent.
     */
    private final static HashMap<Reference<?>, ResourceFinalizer> finalizerMap =
            new HashMap<Reference<?>, ResourceFinalizer>(16, 2F);

    static {
        // create and start finalizer thread
        final Thread mainLoop = new Thread(new Runnable() {
            @Override
            public void run() {
                finalizerMainLoop();
            }
        }, "ResourceFinalizer");
        mainLoop.setDaemon(true);
        mainLoop.setPriority(Thread.NORM_PRIORITY + 1);
        mainLoop.start();

        // add a shutdown hook to take care of resources when the VM terminates
        final Thread shutdownHook = new Thread(new Runnable() {
            @Override
            public void run() {
                shutdownHook();
            }
        });
        Runtime.getRuntime().addShutdownHook(shutdownHook);
    }

    /**
     * Main loop that runs permanently and executes the finalizers for
     * each object that has been garbage collected. 
     */
    private static void finalizerMainLoop() {
        while (true) {
            final Reference<?> reference;
            try {
                reference = referenceQueue.remove();
            } catch (final InterruptedException e) {
                // this will terminate the thread, should never happen
                throw new RuntimeException(e);
            }
            final ResourceFinalizer finalizer;
            // find the finalizer for the reference
            synchronized (finalizerMap) {
                finalizer = finalizerMap.remove(reference);
            }
            // run the finalizer
            callFinalizer(finalizer);
        }
    }

    /**
     * Called when the VM shuts down normally. Takes care of calling
     * all finalizers that haven't been triggered yet.
     */
    private static void shutdownHook() {
        // get all remaining resource finalizers
        final List<ResourceFinalizer> remaining;
        synchronized (finalizerMap) {
            remaining = new ArrayList<ResourceFinalizer>(finalizerMap.values());
            finalizerMap.clear();
        }
        // call all remaining finalizers
        for (final ResourceFinalizer finalizer : remaining) {
            callFinalizer(finalizer);
        }
    }

    private static void callFinalizer(final ResourceFinalizer finalizer) {
        try {
            finalizer.cleanup();
        } catch (final Exception e) {
            // don't care if a finalizer throws
        }
    }

}

答案 2 :(得分:0)

如果您不是特别担心快速清理文件,那么finalize就是您的选择。即使您的内存不足,也无法保证任何特定对象都是GC,因为VM理论上只能收集部分堆。但是如果一个对象是GC,那么它将被最终确定,因此你知道你将拥有最多sizeof(heap)/ sizeof(内存中句柄)的uninalized blob,这会对你的磁盘使用量产生一些限制。这是一个非常弱的界限,但听起来它对你来说可能已经足够了。

答案 3 :(得分:0)

在一个紧要关头,只是在你的终结器中做这个并不是一个坏的解决方案,至少会关闭你的文件的很大一部分,可能。如果那还不错,我会沿着这条路走下去,因为它会更容易。

另一方面,如果你正在寻找任何确定性,那么使用终结器是非常糟糕的;你不能依赖它们永远运行,更不用说及时了,同样的论点也适用于清理各种特殊类型的参考。它取决于您的应用程序和硬件的详细信息,但一般而言,您无法保证在磁盘填满之前将清理引用。

如果您在内存中占用的数据(占用大部分空间)很大但很短暂,而文件引用的持续时间更长,则更有可能发生这种情况。这导致了许多次要的垃圾收集,这将清理年轻代空间,删除死数据并最终促进许多文件引用,但不会产生大的垃圾收集,这会清除较旧的终端对象,例如文件引用,所以这些都会无限期地保持活着。查看this了解更多GC背景信息。您可以通过增加年轻一代的尺寸来改善您的终结器实际受到的影响,以换取速度稍慢的GC。

如果您确实需要更多确定性,我会稍微改变一下问题。首先,在终结器中实施清理,作为一种快速简单的解决方案。然后建立一个后备;决定你准备好文件占用的最大空间量,最好比实际使用的空间大得多,监控你每隔X分钟使用的总空间,如果它超过这个界限,那么删除一个选择最旧的(按最后写入时间)文件,例如最老的10%。这给你一个相当困难的上限,你可以在这里保持非常低的检查频率,因为终结器应该有望解决大多数问题。

我认为可能与半相关的另一个注意事项是deleteOnExit。在创建临时文件时对其进行调用将保证在JVM成功退出时自动删除它们。这确实有downsides:JVM必须保存对此文件的引用,直到它关闭,这会留下一个小的内存泄漏(我相信每个文件1K)。不确定这是否值得你,但可能有所帮助!