我正在尝试在AWS sagemaker上的script mode
中运行模型(python脚本)。我尝试使用Tensorflow估算器从笔记本调用脚本,如下所示
from sagemaker.tensorflow import TensorFlow
tf_estimator = TensorFlow(
entry_point='train.py',
role=role,
train_instance_count=1,
train_instance_type='local_gpu',
framework_version='1.12',
py_version='py3',
script_mode=True,
hyperparameters={'epochs': 10})
tf_estimator.fit({'training': training_path_input, 'validation': validation_path_input})
我得到如下所示的错误。
> Creating tmpvq65nmup_algo-1-wipol_1 ...
> ting tmpvq65nmup_algo-1-wipol_1 ... error
> ERROR: for tmpvq65nmup_algo-1-wipol_1 Cannot start service algo-1-wipol: OCI runtime create failed: container_linux.go:349:
> starting container process caused "process_linux.go:449: container
> init caused \"process_linux.go:432: running prestart hook 1 caused
> \\\"error running hook: exit status 1, stdout: , stderr:
> nvidia-container-cli: initialization error: nvml error: driver not
> loaded\\\\n\\\"\"": unknown
我想知道如何解决此问题。
答案 0 :(得分:0)
您好,您能否提供有关笔记本实例的更多信息,以及运行笔记本实例的内核?
问题似乎是未安装nvidia驱动程序。