在500行数据上部署经过Google AutoML文本分类训练的模型

时间:2019-11-08 20:15:29

标签: google-cloud-automl google-cloud-automl-nl

我有一个受过训练的google AutoML文本分类模型,我希望将其部署在csv文件中存储的500行数据中。csv文件存储在google存储桶中,该模型可以将每一行预测为“真或假” ,具体取决于模型返回的内容。当前,该代码似乎仅支持单行/一个文本预测。如何使用创建的模型进行批次分类?

1 个答案:

答案 0 :(得分:0)

请参阅下文,了解适用于我的解决方案。

import pandas as pd
import numpy as np
from google.cloud import automl_v1beta1 as automl
from google.oauth2 import service_account

# Load the csv
# For my case, I am predicting either 'Include' or 'Exclude' classes
data =pd.read_csv('../df_pred.csv', encoding='utf-8')

# assign project id and model id
project_id = 'xxxxxx'
compute_region = 'us-central1'
model_id = 'xxxxx'

# Create client for prediction service.
credentials = service_account.Credentials.from_service_account_file("xxxxx.json")
automl_client = automl.AutoMlClient(credentials=credentials)
prediction_client = automl.PredictionServiceClient(credentials=credentials)


# Get the full path of the model.
model_full_id = automl_client.model_path(
    project_id, compute_region, model_id
)

# Loop over the csv lines for the sentences you want to predict

# Temp dataframe to store the prediction scores
df = pd.DataFrame()

# sentence = column of interest
for sentence in data.sentence.values:
    snippet = sentence

    # Set the payload by giving the content and type of the file.
    payload = {"text_snippet": {"content": snippet, "mime_type": "text/plain"}}

    # params is additional domain-specific parameters.
    # currently there is no additional parameters supported.
    params = {}
    response = prediction_client.predict(model_full_id, payload, params)

    temp = pd.DataFrame({'p_exclude': [response.payload[0].classification.score], 
                         'p_include': [response.payload[1].classification.score]})

    df = pd.concat([df, temp],ignore_index=True)

# Add the predicted scores to the original Dataframe 
df_automl = pd.concat([data, df], axis =1)
# Export the new Dataframe
df_automl.to_csv("df_automl.csv", index = False)