我的目标是在机器上将H2O与没有CUDA的TensorFlow集成。
由于TensorFlow支持CPU和GPU执行,我希望在没有CUDA的情况下实现H2O / TensorFlow集成。但是我在system specifications of Deep Water中提到CUDA软件时感到非常困惑。
我曾试图在H2O Flow中建造Deep Water / TensorFlow模型,但失败了。我执行的步骤:
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: No backend found. Cannot build a Deep Water model.
at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
所以我的问题是:
更新1 :
我已将gpu参数设置为false,并尝试使用所有可能的后端再次构建模型。 caffe和tensorflow都产生如上所示的相同堆栈跟踪。 mxnet也失败了,但有两个不同的堆栈跟踪。
mxnet(首次尝试构建模型):
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: null
at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
mxnet(后续尝试):
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: Could not initialize class deepwater.backends.mxnet.MXNetBackend$MXNetLoader
at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
更新2
环境:
lspci -vnn | grep VGA
会返回00:0f.0 VGA compatible controller [0300]: VMware SVGA II Adapter [15ad:0405] (prog-if 00 [VGA controller])
我已清除/ tmp目录并再次尝试使用mxnet。在第一次尝试时,我遇到了新的例外:
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: /tmp/libmxnet.so: libcudart.so.8.0: cannot open shared object file: No such file or directory
at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
文件/tmp/libmxnet.so
已存在,其权限为-rw-rw-r--
。