我有一个受过训练的google AutoML文本分类模型,我希望将其部署在csv文件中存储的500行数据中。csv文件存储在google存储桶中,该模型可以将每一行预测为“真或假” ,具体取决于模型返回的内容。当前,该代码似乎仅支持单行/一个文本预测。如何使用创建的模型进行批次分类?
答案 0 :(得分:0)
请参阅下文,了解适用于我的解决方案。
import pandas as pd
import numpy as np
from google.cloud import automl_v1beta1 as automl
from google.oauth2 import service_account
# Load the csv
# For my case, I am predicting either 'Include' or 'Exclude' classes
data =pd.read_csv('../df_pred.csv', encoding='utf-8')
# assign project id and model id
project_id = 'xxxxxx'
compute_region = 'us-central1'
model_id = 'xxxxx'
# Create client for prediction service.
credentials = service_account.Credentials.from_service_account_file("xxxxx.json")
automl_client = automl.AutoMlClient(credentials=credentials)
prediction_client = automl.PredictionServiceClient(credentials=credentials)
# Get the full path of the model.
model_full_id = automl_client.model_path(
project_id, compute_region, model_id
)
# Loop over the csv lines for the sentences you want to predict
# Temp dataframe to store the prediction scores
df = pd.DataFrame()
# sentence = column of interest
for sentence in data.sentence.values:
snippet = sentence
# Set the payload by giving the content and type of the file.
payload = {"text_snippet": {"content": snippet, "mime_type": "text/plain"}}
# params is additional domain-specific parameters.
# currently there is no additional parameters supported.
params = {}
response = prediction_client.predict(model_full_id, payload, params)
temp = pd.DataFrame({'p_exclude': [response.payload[0].classification.score],
'p_include': [response.payload[1].classification.score]})
df = pd.concat([df, temp],ignore_index=True)
# Add the predicted scores to the original Dataframe
df_automl = pd.concat([data, df], axis =1)
# Export the new Dataframe
df_automl.to_csv("df_automl.csv", index = False)