我有一个像这样的标准模板代码的jupyter笔记本
从sagemaker.tensorflow导入TensorFlow
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
tf_estimator = TensorFlow(entry_point='sagemaker_predict_2.py', role=role,
training_steps=10000, evaluation_steps=100,
train_instance_count=1, train_instance_type='ml.p2.xlarge',
framework_version='1.10.0')
tf_estimator.fit('s3://XXX-sagemaker/XXX')
这一切正常,但最终会引发错误
2018-11-27 06:21:12 Starting - Starting the training job...
2018-11-27 06:21:15 Starting - Launching requested ML instances.........
2018-11-27 06:22:44 Starting - Preparing the instances for training...
2018-11-27 06:23:35 Downloading - Downloading input data...
2018-11-27 06:24:03 Training - Downloading the training image......
2018-11-27 06:25:12 Training - Training image download completed. Training in progress..
2018-11-27 06:25:11,813 INFO - root - running container entrypoint
2018-11-27 06:25:11,813 INFO - root - starting train task
2018-11-27 06:25:11,833 INFO - container_support.training - Training starting
2018-11-27 06:25:15,306 ERROR - container_support.training - uncaught exception during training: No module named keras
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 36, in start
fw.train()
File "/usr/local/lib/python2.7/dist-packages/tf_container/train_entry_point.py", line 143, in train
customer_script = env.import_user_module()
File "/usr/local/lib/python2.7/dist-packages/container_support/environment.py", line 101, in import_user_module
user_module = importlib.import_module(script)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/opt/ml/code/sagemaker_predict_2.py", line 7, in <module>
import keras
ImportError: No module named keras
我的sagemaker_predict_2.py
需要其中一些库:
import pandas as pd
import numpy as np
import sys
import keras
from keras.models import Model, Input
from keras.layers import LSTM, Embedding, Dense, TimeDistributed, Dropout, Bidirectional
from keras.wrappers.scikit_learn import KerasClassifier
from keras_contrib.layers import CRF
我认为导入pandas
和numpy
没有问题,但是在导入keras
时就死了。我以为keras
是笔记本电脑的标准配置。当我启动该脚本时,它是否还有其他未初始化的环境?
此外,我相信keras_contrib
不是标准的,所以我需要一种安装方法。我该怎么办?
我在上面的单元格中尝试过!pip install keras
,但是它报告了Requirement already satisfied
,因此看来我的jupyter环境中具有该库。但是启动sagemaker_predict_2.py
一定是在不同的环境中吗?
答案 0 :(得分:0)
您是正确的。 sagemaker_predict_2.py在与笔记本实例不同的环境中运行。特定的代码在我们的预定义TensorFlow Docker容器内执行的SageMaker上运行。
在笔记本实例中安装依赖项将仅允许访问笔记本内核中已安装的库。
关于在运行的Docker容器中安装依赖项,可以通过specifying your dependencies在requirements.txt中实现。
由于迭代可能需要8到10分钟的时间,因此建议在将培训作业发送到SageMaker之前,使用本地模式来确保您的培训作业可以在本地运行。可以通过将training_instance_type指定为“本地”来完成,或者请参考此笔记本:https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_local_mode_mnist.ipynb
本质上,本地模式的作用是在正在执行Python代码的localhost上运行docker容器。这可以在我们的SageMaker笔记本实例上,也可以在您自己的本地计算机上。