Question

我正在使用SageMaker培训工作来训练ML模型，而我正在尝试将模型输出到S3上的特定位置。

代码：

model_uri = "s3://***/model/"
script_path = 'entry_point.py'
sklearn = SKLearn(
    entry_point=script_path,
    train_instance_type="ml.m5.large",
    output_path=model_uri,
    role='***',
    sagemaker_session=sagemaker_session)

我遇到的问题是，培训工作将保存模型两次。一次在最高级别的S3存储桶中，一次在指定的文件夹（/model）中。

最高级别：

模型文件夹：

在估算器中指定output_path时，这是预期的行为吗？有办法阻止它吗？

任何帮助将不胜感激！

Answer 1

如果您在顶级文件夹中查找，它将实际上包含作业创建的其他信息，而Model文件夹中的作业文件夹实际上将包含流程中的.joblib模型（作为tar.gz文件）。

在创建code_location对象时使用SKLearn参数。例如：

model_uri = "s3://***/model/"
training_output_uri = "s3://***/training-output"
script_path = 'entry_point.py'
sklearn = SKLearn(
    entry_point=script_path,
    train_instance_type="ml.m5.large",
    output_path=model_uri,
    code_location=training_output_uri,
    role='***',
    sagemaker_session=sagemaker_session)

在S3存储桶中创建“ training-output”文件夹的位置。

参考：code_location参数来自Framework类所基于的SKLearn父类。

指定模型输出的S3位置，以进行sagemaker培训作业重复问题

1 个答案: