如何确保使用sagemaker安装的库?

时间:2018-11-27 06:30:20

标签: tensorflow keras amazon-sagemaker

我有一个像这样的标准模板代码的jupyter笔记本

从sagemaker.tensorflow导入TensorFlow

import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()

tf_estimator = TensorFlow(entry_point='sagemaker_predict_2.py', role=role,
                          training_steps=10000, evaluation_steps=100,
                          train_instance_count=1, train_instance_type='ml.p2.xlarge',
                          framework_version='1.10.0')
tf_estimator.fit('s3://XXX-sagemaker/XXX')

这一切正常,但最终会引发错误

2018-11-27 06:21:12 Starting - Starting the training job...
2018-11-27 06:21:15 Starting - Launching requested ML instances.........
2018-11-27 06:22:44 Starting - Preparing the instances for training...
2018-11-27 06:23:35 Downloading - Downloading input data...
2018-11-27 06:24:03 Training - Downloading the training image......
2018-11-27 06:25:12 Training - Training image download completed. Training in progress..
2018-11-27 06:25:11,813 INFO - root - running container entrypoint
2018-11-27 06:25:11,813 INFO - root - starting train task
2018-11-27 06:25:11,833 INFO - container_support.training - Training starting
2018-11-27 06:25:15,306 ERROR - container_support.training - uncaught exception during training: No module named keras
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 36, in start
    fw.train()
  File "/usr/local/lib/python2.7/dist-packages/tf_container/train_entry_point.py", line 143, in train
    customer_script = env.import_user_module()
  File "/usr/local/lib/python2.7/dist-packages/container_support/environment.py", line 101, in import_user_module
    user_module = importlib.import_module(script)
  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "/opt/ml/code/sagemaker_predict_2.py", line 7, in <module>
    import keras
ImportError: No module named keras  

我的sagemaker_predict_2.py需要其中一些库:

import pandas as pd
import numpy as np
import sys
import keras
from keras.models import Model, Input
from keras.layers import LSTM, Embedding, Dense, TimeDistributed, Dropout, Bidirectional
from keras.wrappers.scikit_learn import KerasClassifier
from keras_contrib.layers import CRF

我认为导入pandasnumpy没有问题,但是在导入keras时就死了。我以为keras是笔记本电脑的标准配置。当我启动该脚本时,它是否还有其他未初始化的环境?

此外,我相信keras_contrib不是标准的,所以我需要一种安装方法。我该怎么办?

我在上面的单元格中尝试过!pip install keras,但是它报告了Requirement already satisfied,因此看来我的jupyter环境中具有该库。但是启动sagemaker_predict_2.py一定是在不同的环境中吗?

1 个答案:

答案 0 :(得分:0)

您是正确的。 sagemaker_predict_2.py在与笔记本实例不同的环境中运行。特定的代码在我们的预定义TensorFlow Docker容器内执行的SageMaker上运行。

在笔记本实例中安装依赖项将仅允许访问笔记本内核中已安装的库。

关于在运行的Docker容器中安装依赖项,可以通过specifying your dependenciesrequirements.txt中实现。

由于迭代可能需要8到10分钟的时间,因此建议在将培训作业发送到SageMaker之前,使用本地模式来确保您的培训作业可以在本地运行。可以通过将training_instance_type指定为“本地”来完成,或者请参考此笔记本:https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_local_mode_mnist.ipynb

本质上,本地模式的作用是在正在执行Python代码的localhost上运行docker容器。这可以在我们的SageMaker笔记本实例上,也可以在您自己的本地计算机上。