在sagemaker中,能够从s3加载和部署模型。在反序列化数据以进行预测时,我在行上收到“ UnicodeDecodeError:'utf-8'编解码器无法解码位置2的字节0xd7:无效的继续字节” “结果= dictor.predict(test_X)”
我尝试了以下sagemaker示例https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_applying_machine_learning/linear_time_series_forecast/linear_time_series_forecast.ipynb。我能够在s3中创建训练,验证和部署模型以及存储模型。
此后,我想将模型从s3导入sagemaker并使用导入的模型进行测试。能够加载和部署模型,但是在预测测试值时,我得到了UnicodeDecodeError
from sagemaker.predictor import csv_serializer, json_deserializer
role = get_execution_role()
sagemaker_session = sagemaker.Session()
model_data = sagemaker.session.s3_input( model_file_location_in_s3, distribution='FullyReplicated', content_type='application/x-sagemaker-model', s3_data_type='S3Prefix')
sagemaker_model = sagemaker.LinearLearnerModel(model_data=model_file,
role=role,
sagemaker_session=sagemaker_session)
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.t2.medium')
#loading test data
gas = pd.read_csv('gasoline.csv', header=None, names=['thousands_barrels'],encoding='utf-8')
gas['thousands_barrels_lag1'] = gas['thousands_barrels'].shift(1)
gas['thousands_barrels_lag2'] = gas['thousands_barrels'].shift(2)
gas['thousands_barrels_lag3'] = gas['thousands_barrels'].shift(3)
gas['thousands_barrels_lag4'] = gas['thousands_barrels'].shift(4)
gas['trend'] = np.arange(len(gas))
gas['log_trend'] = np.log1p(np.arange(len(gas)))
gas['sq_trend'] = np.arange(len(gas)) ** 2
weeks = pd.get_dummies(np.array(list(range(52)) * 15)[:len(gas)], prefix='week')
gas = pd.concat([gas, weeks], axis=1)
gas = gas.iloc[4:, ]
split_train = int(len(gas) * 0.6)
split_test = int(len(gas) * 0.3)
test_y = gas['thousands_barrels'][split_test:]
test_X = gas.drop('thousands_barrels', axis=1).iloc[split_test:, ].as_matrix()
predictor.content_type = 'text/csv'
predictor.serializer = csv_serializer
predictor.deserializer = json_deserializer
results = predictor.predict(test_X)
one_step = np.array([r['score'] for r in results['predictions']])
程序在训练和部署模型时工作正常(例如,但从s3加载时会抛出此错误。
)测试数据为numpy ndarray。
答案 0 :(得分:0)
解串器似乎不适合响应的内容。
要进行调查,编写一个自定义解串器,仅打印一些细节:
def debug_deserializer(data, content_type):
print(content_type)
print(data)
并将其应用为:
predictor.deserializer = debug_deserializer
例如,这可能会产生如下内容:
application/x-recordio-protobuf
<botocore.response.StreamingBody object at 0x7fd3544883c8>
None
告诉您内容类型为application/x-recordio-protobuf
。然后编写一个自定义解串器,例如:
from sagemaker.amazon.common import record_deserializer
def recordio_protobuf_deserialize(data, content_type):
deserializer = record_deserializer()
return deserializer(data 'not used')
并按以下方式申请:
predictor.deserializer = recordio_protobuf_deserialize