如何使用s3中的预训练模型来预测一些数据?

时间:2019-05-22 10:50:48

标签: python amazon-web-services amazon-s3 boto3 amazon-sagemaker

我已经使用sagemaker训练了语义分割模型,并且输出已保存到s3存储桶中。我想从s3加载此模型以预测sagemaker中的某些图像。

我知道如何预测训练后是否让笔记本实例继续运行,因为这只是一个简单的部署,但是如果我想使用较旧的模型并没有太大帮助。

我已经查看了这些来源,并自己提出了一些建议,但是它没有用,因此我在这里:

https://course.fast.ai/deployment_amzn_sagemaker.html#deploy-to-sagemaker https://aws.amazon.com/getting-started/tutorials/build-train-deploy-machine-learning-model-sagemaker/

https://sagemaker.readthedocs.io/en/stable/pipeline.html

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/inference_pipeline_sparkml_xgboost_abalone/inference_pipeline_sparkml_xgboost_abalone.ipynb

我的代码是这样的:

from sagemaker.pipeline import PipelineModel
from sagemaker.model import Model

s3_model_bucket = 'bucket'
s3_model_key_prefix = 'prefix'
data = 's3://{}/{}/{}'.format(s3_model_bucket, s3_model_key_prefix, 'model.tar.gz')
models = ss_model.create_model() # ss_model is my sagemaker.estimator

model = PipelineModel(name=data, role=role, models= [models])
ss_predictor = model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge')

2 个答案:

答案 0 :(得分:0)

您实际上可以从现有工件实例化Python SDK model对象,并将其部署到端点。这使您可以从受过训练的工件中部署模型,而不必在笔记本中进行重新训练。例如,对于语义分割模型:

trainedmodel = sagemaker.model.Model(
    model_data='s3://...model path here../model.tar.gz',
    image='685385470294.dkr.ecr.eu-west-1.amazonaws.com/semantic-segmentation:latest',  # example path for the semantic segmentation in eu-west-1
    role=role)  # your role here; could be different name

trainedmodel.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge')

类似地,您可以使用以下命令从任何支持SDK的经过身份验证的客户端实例化已部署端点上的预测变量对象:

predictor = sagemaker.predictor.RealTimePredictor(
    endpoint='endpoint name here',
    content_type='image/jpeg',
    accept='image/png')

有关这些抽象的更多信息:

答案 1 :(得分:0)

input_features_data是一个数据框

import sagemaker
from sagemaker.predictor import csv_serializer, json_deserializer

predictor = sagemaker.predictor.RealTimePredictor(
    endpoint= PREDICTOR_ENDPOINT_NAME,
    sagemaker_session=sagemaker.Session(),
    serializer=csv_serializer,
    deserializer=json_deserializer,
    content_type='text/csv',
)

test_batch_size = 5
num_batches = -(-len(input_features_data) // test_batch_size)
count=0
predicted_values = []
for i in range(num_batches):
    predicted_values += [predictor.predict(x) for x in
                         input_features_data[i * test_batch_size:(i + 1) * test_batch_size]]

return np.asarray(predicted_values)