尚不清楚为什么带有CUDA和cuDNN的Deeplearning4j会因OutOfMemory而失败

时间:2019-01-12 13:10:20

标签: gpu cudnn deeplearning4j

Env:Windows 7,GeForce GTX 750,CUDA 10.0,cuDNN 7.4

Maven依赖项:

vhost

每10个迷你批次我都在测试测试中检查性能。我曾经打电话给net.evaluate(),但这给了我这个错误:

    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-cuda-10.0</artifactId>
        <version>1.0.0-beta3</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-cuda-10.0</artifactId>
        <version>1.0.0-beta3</version>
    </dependency>

然后,我使用训练= false从net.evaluate()切换到net.output(),并将测试集的大小从100减少到仅20。 这项工作没有错误。我试图将记录数增加到30,它显示了此警告,但仍能正常工作:

Exception in thread "AMDSI prefetch thread" java.lang.RuntimeException: java.lang.RuntimeException: Failed to allocate 637074016 bytes from DEVICE [0] memory
    at org.deeplearning4j.datasets.iterator.AsyncMultiDataSetIterator$AsyncPrefetchThread.run(AsyncMultiDataSetIterator.java:396)
Caused by: java.lang.RuntimeException: Failed to allocate 637074016 bytes from DEVICE [0] memory
    at org.nd4j.jita.memory.CudaMemoryManager.allocate(CudaMemoryManager.java:76)
    at org.nd4j.jita.workspace.CudaWorkspace.init(CudaWorkspace.java:88)
    at org.nd4j.linalg.memory.abstracts.Nd4jWorkspace.initializeWorkspace(Nd4jWorkspace.java:508)
    at org.nd4j.linalg.memory.abstracts.Nd4jWorkspace.close(Nd4jWorkspace.java:651)
    at org.deeplearning4j.datasets.iterator.AsyncMultiDataSetIterator$AsyncPrefetchThread.run(AsyncMultiDataSetIterator.java:372)

我可以理解显卡上没有足够的内存(GeForce GTX 750 Spec表示内存为1G),但是 因为它可以使用主机内存,所以我将测试集的大小增加到100,并因该错误而永久失败:

2019-01-12 14:47:44 WARN  org.deeplearning4j.nn.layers.BaseCudnnHelper Cannot allocate 300000000 bytes of device memory (CUDA error = 2), proceeding with host memory

现在,我假设2019-01-12 14:59:29 WARN org.deeplearning4j.nn.layers.BaseCudnnHelper Cannot allocate 1000000000 bytes of device memory (CUDA error = 2), proceeding with host memory Exception in thread "main" 2019-01-12 14:59:39 ERROR org.deeplearning4j.util.CrashReportingUtil >>> Out of Memory Exception Detected. Memory crash dump written to: C:\DATA\Projects\dl4j-language-model\dl4j-memory-crash-dump-1547294372940_1.txt java.lang.OutOfMemoryError: Failed to allocate memory within limits: totalBytes (470M + 7629M) > maxBytes (7851M) 2019-01-12 14:59:39 WARN org.deeplearning4j.util.CrashReportingUtil Memory crash dump reporting can be disabled with CrashUtil.crashDumpsEnabled(false) or using system property -Dorg.deeplearning4j.crash.reporting.enabled=false at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:580) at org.deeplearning4j.nn.layers.BaseCudnnHelper$DataCache.<init>(BaseCudnnHelper.java:119) 2019-01-12 14:59:39 WARN org.deeplearning4j.util.CrashReportingUtil Memory crash dump reporting output location can be set with CrashUtil.crashDumpOutputDirectory(File) or using system property -Dorg.deeplearning4j.crash.reporting.directory=<path> at org.deeplearning4j.nn.layers.recurrent.CudnnLSTMHelper.activate(CudnnLSTMHelper.java:509) 是指堆大小(JVM以-Xmx8G -Xms8G运行),但是我还输出了maxBytes (7851M) RuntimefreeMemory(),并且它在崩溃前显示了以下内容,这足以释放可用内存:

totalMemory()

所以我的问题是,2019-01-12 15:29:20 INFO Free memory: 7722607976/8232370176 数字来自哪里?如果JVM内部有可用内存,为什么不能分配所需的1G?

以下是内存崩溃报告:

totalBytes (470M + 7629M)

1 个答案:

答案 0 :(得分:0)

因此,请简短说明以解决该问题。 ND4J使用堆外内存,该内存基本上映射到GPU内存。因此,就像@Samuel Audet指出的那样,7629M指的是堆外内存,显然不适合我的GTX 750的GPU内存。

DL4J doc的最终注释:

  

请注意,如果您的GPU的RAM <2g,则可能不适用于深度学习。   如果是这种情况,则应考虑使用CPU。典型的深度学习工作负载应至少具有4GB的RAM。即使很小。对于深度学习工作负载,建议在GPU上使用8GB RAM。