更新1:
来自H2O深水云的日志文件:https://drive.google.com/file/d/0B_1g718qYsqhcUl4WFQ5S1NKbE0/view?usp=sharing
我想在MS Azure上使用支持GPU的云实例测试H2O深水(NC6 - https://azure.microsoft.com/en-us/blog/azure-n-series-general-availability-on-december-1/)。 但是运行H2O Deep Water我得到一个错误说:
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: Could not initialize class deepwater.backends.mxnet.MXNetBackend$MXNetLoader
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: null
配置和设置如下:
在NC6 VM上配置DSVM之后。我检查了深水的先决条件 - CUDA& CUDANN:
sysadmin@DEVSMTTSYGPU002:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
sysadmin@DEVSMTTSYGPU002:~$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 5
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 10
之后我运行了以下步骤:
设置env vars:
export CUDA_PATH=/usr/local/cuda
export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH
为python 2.7安装pip
sudo apt-get install python-pip
安装深水:
pip2 install http://s3.amazonaws.com/h2o-deepwater/public/nightly/latest/h2o-3.13.0-py2.py3-none-any.whl
安装libatlas-base-dev
sudo apt-get install libatlas-base-dev
要运行示例,我启动python 2.7并运行
import h2o
h2o.init()
之后我使用H2O Flow创建一些人工数据并学习一个简单的深水模型
createFrame {"dest":"MNIST_SIM_60k","rows":"60000","cols":"784","seed":7595850248774472000,"seed_for_column_types":-1,"randomize":true,"value":0,"real_range":100,"categorical_fraction":"0","factors":5,"integer_fraction":"1","binary_fraction":"0","binary_ones_fraction":"0","time_fraction":0,"string_fraction":0,"integer_range":"127","missing_fraction":"0","response_factors":2,"has_response":true}
buildModel 'deepwater', {"model_id":"deepwater-782cc564-497c-4c39-a22a-b6904fb04188","training_frame":"MNIST_SIM_60k","nfolds":0,"response_column":"response","ignored_columns":[],"epochs":"100","ignore_const_cols":true,"network":"auto","activation":"Rectifier","hidden":[100],"problem_type":"dataset","checkpoint":"","autoencoder":false,"balance_classes":false,"score_each_iteration":false,"categorical_encoding":"AUTO","train_samples_per_iteration":-2,"standardize":true,"distribution":"AUTO","score_interval":5,"score_training_samples":10000,"score_validation_samples":0,"score_duty_cycle":0.1,"stopping_rounds":5,"stopping_metric":"AUTO","stopping_tolerance":0,"max_runtime_secs":0,"backend":"tensorflow","image_shape":[0,0],"channels":3,"network_definition_file":"","network_parameters_file":"","mean_image_file":"","export_native_parameters_prefix":"","input_dropout_ratio":0,"hidden_dropout_ratios":[],"overwrite_with_best_model":true,"target_ratio_comm_to_comp":0.05,"seed":-1,"learning_rate":0.001,"learning_rate_annealing":0.000001,"momentum_start":0.9,"momentum_ramp":10000,"momentum_stable":0.9,"classification_stop":0,"shuffle_training_data":true,"mini_batch_size":32,"clip_gradient":10,"sparse":false,"gpu":true,"device_id":[0],"cache_data":true}
对于后端(mxnet和tensorflow),我得到了上面提到的错误。对于张量流,堆栈跟踪是
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: null
at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:267)
at hex.deepwater.DeepWaterModelInfo.<init>(DeepWaterModelInfo.java:214)
at hex.deepwater.DeepWaterModel.<init>(DeepWaterModel.java:227)
at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:131)
at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:118)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:173)
at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:111)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1255)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
对于mxnet,stacktrace是
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: Could not initialize class deepwater.backends.mxnet.MXNetBackend$MXNetLoader
at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:267)
at hex.deepwater.DeepWaterModelInfo.<init>(DeepWaterModelInfo.java:214)
at hex.deepwater.DeepWaterModel.<init>(DeepWaterModel.java:227)
at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:131)
at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:118)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:173)
at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:111)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1255)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
如何使用至少一个后端运行H2O Deep Water?
旁注:来自H2O支持GPU的xgboost工作。
非常感谢
罗伯特
答案 0 :(得分:0)
我认为除了使用docker镜像之外我们还没有尝试过运行Azure。你在使用Ubuntu 16.04吗?如果是这样,它应该工作,除非它与标准Ubuntu 16.04之间存在差异。好像h2o无法与后端通信。如果你可以从h2o发布完整的日志,我可以试着看看问题是什么。
否则我会说运行它的最简单方法是使用docker镜像,这就是我的建议。一切都已经安装好了。您需要安装的唯一东西是docker和nvidia-docker。说明:https://github.com/h2oai/deepwater#pre-release-docker-image