如何导出估算器tf.estimator.DNNClassifier

时间:2018-07-26 11:25:24

标签: python tensorflow tensorflow-serving tensorflow-estimator

大家好,这是我的代码,仍然是使用tensorflow的初学者,这是我的代码 在尝试运行文本分类DNN直到现在一切都很好。 我想保存我的模型并导入它,以便可以用它来预测新值,但是我不知道该怎么做。

让您大致了解要执行的操作。 我有2个文件夹(培训和测试) 每个文件夹有(4个文件夹(分类类别))

import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re
import seaborn as sns
import logging



print("Loading all files from directory ...")
# Load all files from a directory in a DataFrame.
def load_directory_data(directory):
  data = {}
  data["sentence"] = []
  data["tnemitnes"] = []
  print("getting in a loop")
  for file_path in os.listdir(directory):
    with tf.gfile.GFile(os.path.join(directory, file_path), "r") as f:
      print("directory : ",directory)
      print("file path : ",file_path)
      data["sentence"].append(f.read())
      data["tnemitnes"].append(re.match("(\d+)\.txt", file_path).group(1))
  return pd.DataFrame.from_dict(data)

print("merging all files in the training set ...")
# Merge all type of emails examples, add a polarity column and shuffle.
def load_dataset(directory):
  pos_df = load_directory_data(os.path.join("train/br"))
  neg_df = load_directory_data(os.path.join(directory, "train/mi"))
  dos_df = load_directory_data(os.path.join(directory, "train/Brouillons")) #dsd
  nos_df = load_directory_data(os.path.join(directory, "train/favoris")) #dsd
  pos_df["polarity"] = 3
  neg_df["polarity"] = 2
  dos_df["polarity"] = 1
  nos_df["polarity"] = 0
  return pd.concat([pos_df, neg_df, dos_df , nos_df]).sample(frac=1).reset_index(drop=True)

print("Getting the data from files ...")
# Download and process the dataset files.
def download_and_load_datasets():
  train_df = load_dataset(os.path.dirname("train"))
  test_df = load_dataset(os.path.dirname("test"))
  
  return train_df, test_df


print("configurring all logging output ...")
# Reduce logging output. ERROR
#logging.set_verbosity(tf.logging.INFO)
logging.getLogger().setLevel(logging.INFO)



print("Setting Up the data for the trainning ...")
train_df, test_df = download_and_load_datasets()
train_df.head()


print("Setting Up a Training input on the whole training set with no limit on training epochs ...")
# Training input on the whole training set with no limit on training epochs.
train_input_fn = tf.estimator.inputs.pandas_input_fn(train_df, train_df["polarity"], num_epochs=None, shuffle=True)

print("Setting Up a Prediction on the whole training set ...")
# Prediction on the whole training set.
predict_train_input_fn = tf.estimator.inputs.pandas_input_fn(train_df, train_df["polarity"], shuffle=False)

print("Setting Up a Prediction on the test set ...")
# Prediction on the test set.
predict_test_input_fn = tf.estimator.inputs.pandas_input_fn(test_df, test_df["polarity"], shuffle=False)


print("Removal of punctuation and splitting on spaces from the data ...")
#The module is responsible for preprocessing of sentences (e.g. removal of punctuation and splitting on spaces).
embedded_text_feature_column = hub.text_embedding_column(key="sentence", module_spec="https://tfhub.dev/google/nnlm-en-dim128/1")

print("Setting Up The Classifier ...")
#Estimator : For classification I did use a DNN Classifier
estimator = tf.estimator.DNNClassifier(
    hidden_units=[10, 20],
    feature_columns=[embedded_text_feature_column],
    n_classes=4,
    optimizer=tf.train.AdagradOptimizer(learning_rate=0.003))



print("Starting the Training ...")
# Training for 50 steps means 5000 training examples with the default
# batch size. This is roughly equivalent to 5 epochs since the training dataset
# contains less examples.
estimator.train(input_fn=train_input_fn, steps=20);

print("the Training had ended...")

print("setting Up the results ...")
train_eval_result = estimator.evaluate(input_fn=predict_train_input_fn)
test_eval_result = estimator.evaluate(input_fn=predict_test_input_fn)


print("Showing the results ...")
print("Training set accuracy: {accuracy}".format(**train_eval_result))
print("Test set accuracy: {accuracy}".format(**test_eval_result))

#this is when am having trouble !!!  <====
tf.estimator.export(
    os.path.dirname("Model"),
    serving_input_fn,
    default_output_alternative_key=None,
    assets_extra=None,
    as_text=False,
    checkpoint_path=None,
    graph_rewrite_specs=(GraphRewriteSpec((tag_constants.SERVING,), ()),),
    strip_default_attrs=False
)

现在,我添加了估算器导出功能后,我开始要求提供 serving_input_fn ,老实说,我确实很难理解如何创建一个。

>

如果有更简单的方法会更好。

3 个答案:

答案 0 :(得分:1)

您可以通过tf.estimator.export.build_parsing_serving_input_receiver_fnlink)轻松获得serving_input_fn

在您的情况下,请执行以下操作:

serving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
        [embedded_text_feature_column])

如果您希望直接传递张量,则同一包中也有build_raw_serving_input_receiver_fn

答案 1 :(得分:1)

您可能之前已经阅读过。 Tensorflow: how to save/restore a model?

应该定义一个serving_input_receiver_fn。

https://www.tensorflow.org/api_docs/python/tf/estimator/export/build_parsing_serving_input_receiver_fn

该文档介绍了一种用于构建serving_input_receiver_fn的有价值的方法。

这里是示例:

# first you should prepare feature_spec. it include the speciation of your feature columns. 

feature_spec = tf.feature_column.make_parse_example_spec(my_feature_columns)
print feature_spec
serving_input_receiver_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)

export_model = classifier.export_savedmodel('./iris/', serving_input_receiver_fn)

答案 2 :(得分:0)

我要做的就是将 model_dir = os.getcwd()+'\ Model' 添加到估算器

model_dir= os.getcwd()+'\Model'

这是新代码,我创建了一个新的Folder并将其命名为model。

estimator = tf.estimator.DNNClassifier(
    hidden_units=[10, 20],
    feature_columns=[embedded_text_feature_column],
    n_classes=4,
    optimizer=tf.train.AdagradOptimizer(learning_rate=0.003),
    model_dir= os.getcwd()+'\Model')