Question

Google Cloud ML-engine支持部署scikit-learn Pipeline对象的功能。例如，文本分类Pipeline可能如下所示，

classifier = Pipeline([
('vect', CountVectorizer()), 
('clf', naive_bayes.MultinomialNB())])

可以训练分类器，

classifier.fit(train_x, train_y)

然后可以将分类器上传到Google Cloud Storage，

model = 'model.joblib'
joblib.dump(classifier, model)
model_remote_path = os.path.join('gs://', bucket_name, datetime.datetime.now().strftime('model_%Y%m%d_%H%M%S'), model)
subprocess.check_call(['gsutil', 'cp', model, model_remote_path], stderr=sys.stdout)

然后可以通过Model或Version创建Google Cloud Console和programmatically，将'model.joblib'文件链接到{{1} }。

然后可以通过调用已部署的模型Version端点，使用此分类器来预测新数据，

predict

Google Cloud ML引擎调用分类器的predict函数并返回预测的类。但是，我希望能够返回置信度分数。通常，这可以通过调用分类器的predict_proba函数来实现，但是似乎没有选择来更改被调用函数。我的问题是：使用Google Cloud ML引擎时，是否可以返回scikit学习分类器的置信度分数？如果没有，您对其他如何获得此结果有何建议？

更新：我找到了一个骇人听闻的解决方案。它涉及使用自己的ml = discovery.build('ml','v1') project_id = 'projects/{}/models/{}'.format(project_name, model_name) if version_name is not None: project_id += '/versions/{}'.format(version_name) request_dict = {'instances':['Test data']} ml_request = ml.projects().predict(name=project_id, body=request_dict).execute()函数覆盖分类器的predict函数，

predict_proba

这令人惊讶。如果有人知道更整洁的解决方案，请告诉我。

更新：Google发布了一项名为Custom prediction routines的新功能（当前处于测试版）。这样一来，您就可以定义当发出预测请求时要运行的代码。它为解决方案添加了更多代码，但肯定没有那么麻烦。

Answer 1

您正在使用的ML Engine API仅具有预测方法，如您在documentation中所见，因此它将仅进行预测（除非您强迫它对您提到的hack进行其他处理））。

如果您想对经过训练的模型进行其他操作，则必须加载并正常使用。如果要使用存储在Cloud Storage中的模型，可以执行以下操作：

from google.cloud import storage
from sklearn.externals import joblib

bucket_name = "<BUCKET_NAME>"
gs_model = "path/to/model.joblib"  # path in your Cloud Storage bucket
local_model = "/path/to/model.joblib"  # path in your local machine

client = storage.Client()
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(gs_model)
blob.download_to_filename(local_model)

model = joblib.load(local_model)
model.predict_proba(test_data)

Google Cloud ML引擎scikit学习预测概率'predict_proba（）'

1 个答案: