由于自定义端点名称,Sagemaker模型部署失败

时间:2019-01-09 00:07:33

标签: amazon-sagemaker

当指定endpoint_name参数时,AWS Sagemaker模型部署失败。有什么想法吗?

在部署中没有endpoint_name参数的情况下,模型部署成功进行。 无论哪种方式,模型训练和保存到S3位置都是成功的。

import boto3
import os
import sagemaker
from sagemaker import get_execution_role
from sagemaker.predictor import csv_serializer
from sagemaker.amazon.amazon_estimator import get_image_uri

bucket = 'Y'
prefix = 'Z'

role = get_execution_role()

    train_data, validation_data, test_data = np.split(df.sample(frac=1, random_state=100), [int(0.5 * len(df)), int(0.8 * len(df))])

    train_data.to_csv('train.csv', index=False, header=False)
    validation_data.to_csv('validation.csv', index=False, header=False)
    test_data.to_csv('test.csv', index=False)
    boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/X/train.csv')).upload_file('train.csv')
    boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/X/validation.csv')).upload_file('validation.csv')

    container = get_image_uri(boto3.Session().region_name, 'xgboost')
    #print(container)

    s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train/{}'.format(bucket, prefix, suffix), content_type='csv')
    s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/{}/'.format(bucket, prefix, suffix), content_type='csv')

    sess = sagemaker.Session()

    output_loc = 's3://{}/{}/output'.format(bucket, prefix)
    xgb = sagemaker.estimator.Estimator(container,
                                        role, 
                                        train_instance_count=1, 
                                        train_instance_type='ml.m4.xlarge',
                                        output_path=output_loc,
                                        sagemaker_session=sess,
                                        base_job_name='X')
    #print('Model output to: {}'.format(output_location))

    xgb.set_hyperparameters(eta=0.5,
                            objective='reg:linear',
                            eval_metric='rmse',
                            max_depth=3,
                            min_child_weight=1,
                            gamma=0,
                            early_stopping_rounds=10,
                            subsample=0.8,
                            colsample_bytree=0.8,
                            num_round=1000)

    #Model fitting
    xgb.fit({'train': s3_input_train, 'validation': s3_input_validation})

    #Deploy model with automatic endpoint created
    xgb_predictor_X = xgb.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge', endpoint_name='X')

    xgb_predictor_X.content_type = 'text/csv'
    xgb_predictor_X.serializer = csv_serializer
    xgb_predictor_X.deserializer = None

INFO:sagemaker:使用名称delaymins创建端点 ClientError:调用CreateEndpoint操作时发生错误(ValidationException):找不到模型“ arn:aws:sagemaker:us-west-2 :: model / X-2019-01-08-18-17-42-158”

1 个答案:

答案 0 :(得分:0)

想通了!如果自定义端点名称在重新部署之前未结束,则它将被列入黑名单(不确定这是否是临时的)。因此,如果发生此错误,则必须使用其他端点名称。故事的寓意:始终在重新部署之前结束端点。