Question

我正在尝试保存Estimator，然后将其加载以根据需要进行预测。我在其中训练模型的部分：

classifier = tf.estimator.Estimator(model_fn=bag_of_words_model)

# Train
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"words": x_train},  # x_train is 2D numpy array of shape (26, 5)
    y=y_train,                   # y_train is 1D panda series of length 26
    batch_size=1000,
    num_epochs=None,
    shuffle=True)

classifier.train(input_fn=train_input_fn, steps=300)

然后我将模型保存如下：

def serving_input_receiver_fn():
    serialized_tf_example = tf.placeholder(dtype=tf.int64, shape=(None, 5), name='words')
    receiver_tensors = {"predictor_inputs": serialized_tf_example}
    features = {"words": tf.tile(serialized_tf_example, multiples=[1, 1])}
    return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)

full_model_dir = classifier.export_savedmodel(export_dir_base="E:/models/",
                                              serving_input_receiver_fn=serving_input_receiver_fn)

我现在加载模型并提供测试集以进行预测：

from tensorflow.contrib import predictor

classifier = predictor.from_saved_model("E:\\models\\1547122667")
predictions = classifier({'predictor_inputs': x_test})
print(predictions)

这给了我以下预测：

{'class': array([ 0,  0,  0,  0,  0,  5,  0,  0,  0,  0,  0,  0,  0,  0,  0, 15,  0,
        0,  5,  0, 20,  0,  5,  0,  0,  0], dtype=int64),
'prob': array([[9.9397606e-01, 6.5355714e-05, 2.2225287e-05, ..., 1.4510043e-07,
            1.6920333e-07, 1.4865007e-07],
           [9.9886864e-01, 1.4976941e-06, 7.0847680e-05, ..., 9.4182191e-08,
            1.1828639e-07, 9.5683227e-08],
           [9.9884748e-01, 2.1105163e-06, 1.1994909e-05, ..., 8.3957858e-08,
            1.0476184e-07, 8.5592234e-08],
           ...,
           [9.6145850e-01, 6.9048328e-05, 1.1446012e-04, ..., 7.3761731e-07,
            8.8173107e-07, 7.3824998e-07],
           [9.7115618e-01, 2.9716679e-05, 5.9592247e-05, ..., 2.8933655e-07,
            3.4183532e-07, 2.9737942e-07],
           [9.7387028e-01, 6.9163914e-05, 1.5800977e-04, ..., 1.6116818e-06,
            1.9025001e-06, 1.5990496e-06]], dtype=float32)}

class和prob是我预测的两件事。现在，如果我在不保存和加载模型的情况下使用相同的测试集预测了输出，则：

# Predict.
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"words": x_test}, y=y_test, num_epochs=1, shuffle=False)
predictions = classifier.predict(input_fn=test_input_fn)
print(predictions)

然后我得到的输出如下：

{'class': 0, 'prob': array([9.9023646e-01, 2.6038184e-05, 3.9950578e-06, ..., 1.3950405e-08,
       1.5713249e-08, 1.3064114e-08], dtype=float32)}
{'class': 1, 'prob': array([2.0078469e-05, 9.9907070e-01, 8.9245419e-05, ..., 6.6533559e-08,
       7.1365662e-08, 6.8764685e-08], dtype=float32)}
{'class': 2, 'prob': array([3.0828053e-06, 9.6484597e-05, 9.9906868e-01, ..., 5.9190391e-08,
       6.0995028e-08, 6.2322023e-08], dtype=float32)}
{'class': 3, 'prob': array([7.4923842e-06, 1.1112734e-06, 1.1697492e-06, ..., 4.4295877e-08,
       4.4563325e-08, 4.0475427e-08], dtype=float32)}
{'class': 4, 'prob': array([4.6085161e-03, 2.8403942e-05, 2.0638861e-05, ..., 7.6083229e-09,
       8.5255349e-09, 6.7836012e-09], dtype=float32)}
{'class': 5, 'prob': array([6.2119620e-06, 7.2357750e-07, 2.6231232e-06, ..., 7.4999367e-09,
       9.0847436e-09, 7.5630142e-09], dtype=float32)}
{'class': 6, 'prob': array([4.4882968e-06, 2.2007227e-06, 8.3352124e-06, ..., 2.3130213e-09,
       2.3657243e-09, 2.0045692e-09], dtype=float32)}
{'class': 7, 'prob': array([1.88617545e-04, 9.01482690e-06, 1.47353385e-05, ...,
       3.38567552e-09, 3.97709154e-09, 3.37017392e-09], dtype=float32)}
{'class': 8, 'prob': array([1.9843496e-06, 4.5909755e-06, 4.8804057e-05, ..., 2.2636470e-08,
       2.0094852e-08, 2.0215294e-08], dtype=float32)}
{'class': 9, 'prob': array([2.5907659e-04, 4.4661370e-05, 6.9490757e-06, ..., 1.6249915e-08,
       1.7579131e-08, 1.5439820e-08], dtype=float32)}
{'class': 10, 'prob': array([3.6456138e-05, 7.5861579e-05, 3.0208937e-05, ..., 2.7859956e-08,
       2.5423596e-08, 2.8662368e-08], dtype=float32)}
{'class': 11, 'prob': array([1.1723863e-05, 9.1407037e-06, 4.8835855e-04, ..., 2.3693143e-08,
       2.0524153e-08, 2.3223269e-08], dtype=float32)}
{'class': 12, 'prob': array([1.2886175e-06, 2.6652628e-05, 2.7812246e-06, ..., 4.8295210e-08,
       4.4282604e-08, 4.7342766e-08], dtype=float32)}
{'class': 13, 'prob': array([3.3486103e-05, 1.3361238e-05, 3.6493871e-05, ..., 2.2195401e-09,
       2.4768412e-09, 2.0150714e-09], dtype=float32)}
{'class': 14, 'prob': array([4.6108948e-05, 3.0377207e-05, 2.0945006e-06, ..., 4.2276231e-08,
       5.2376720e-08, 4.4969173e-08], dtype=float32)}
{'class': 15, 'prob': array([1.7165689e-04, 2.9350400e-05, 3.2283624e-05, ..., 7.1849078e-09,
       7.6871531e-09, 6.6224697e-09], dtype=float32)}
{'class': 16, 'prob': array([5.9876328e-07, 3.0931276e-06, 1.5760432e-05, ..., 4.0450086e-08,
       4.2720632e-08, 4.6017195e-08], dtype=float32)}
{'class': 17, 'prob': array([2.6658317e-04, 9.9656281e-05, 4.0355867e-06, ..., 1.2873563e-08,
       1.4808875e-08, 1.2155732e-08], dtype=float32)}
{'class': 18, 'prob': array([1.4914459e-04, 2.1025437e-06, 1.2505146e-05, ..., 9.8899635e-09,
       1.1115599e-08, 8.9312255e-09], dtype=float32)}
{'class': 19, 'prob': array([2.5615416e-04, 2.3750392e-05, 2.2886352e-04, ..., 3.9635733e-08,
       4.5139984e-08, 3.8605780e-08], dtype=float32)}
{'class': 20, 'prob': array([6.3949975e-04, 2.3652929e-05, 7.8577641e-06, ..., 2.0959168e-09,
       2.5495863e-09, 2.0428985e-09], dtype=float32)}
{'class': 21, 'prob': array([8.2179489e-05, 8.4409467e-06, 5.4756888e-06, ..., 2.2360982e-09,
       2.4820561e-09, 2.1206517e-09], dtype=float32)}
{'class': 22, 'prob': array([3.9681905e-05, 2.4394642e-06, 8.9102805e-06, ..., 2.0282410e-08,
       2.1132811e-08, 1.8368105e-08], dtype=float32)}
{'class': 23, 'prob': array([3.0794261e-05, 6.5104805e-06, 3.3528936e-06, ..., 2.0360846e-09,
       1.9360573e-09, 1.7195430e-09], dtype=float32)}
{'class': 24, 'prob': array([3.4596618e-05, 2.2907707e-06, 2.5318438e-06, ..., 1.1038886e-08,
       1.2148775e-08, 9.9556408e-09], dtype=float32)}
{'class': 25, 'prob': array([1.4846727e-03, 1.9189476e-06, 5.3232620e-06, ..., 3.1966723e-09,
       3.5612517e-09, 3.0947123e-09], dtype=float32)}

这是正确的。请注意，两个输出之间的区别在于，第二个输出中的class以1递增1，而第一种情况中的class在大多数情况下显示为0。

为什么预测结果有所不同？我是否以错误的方式保存了模型？

编辑1：

从this question开始，我知道如果给定Estimator参数，model_dir很容易保存检查点。并在引用相同的model_dir时加载相同的图形。因此，我在保存模型时就这样做了：

classifier = tf.estimator.Estimator(model_fn=bag_of_words_model, model_dir="E:/models/")

我检查了一下，发现检查点已存储在E:/models/中。现在，我要还原模型的部分写道：

# Added model_dir args
classifier = tf.estimator.Estimator(model_fn=bag_of_words_model, model_dir="E:/models/")
# Predict.
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={WORDS_FEATURE: x_test}, y=y_test, num_epochs=1, shuffle=False)
predictions = classifier.predict(input_fn=test_input_fn)

日志给了我

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'E:/models/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x0000028240FAB518>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:From E:\ml_classif\venv\lib\site-packages\tensorflow\python\estimator\inputs\queues\feeding_queue_runner.py:62: QueueRunner.__init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:From E:\ml_classif\venv\lib\site-packages\tensorflow\python\estimator\inputs\queues\feeding_functions.py:500: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
2019-01-14 19:17:51.157091: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
INFO:tensorflow:Restoring parameters from E:/models/model.ckpt-100
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
WARNING:tensorflow:From E:\ml_classif\venv\lib\site-packages\tensorflow\python\training\monitored_session.py:804: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.

表示已成功根据给定的model_dir重建了模型。然后，我尝试预测测试数据的输出，但只能得到与上一个相同的输出：

{'class': 0, 'prob': array([9.8720157e-01, 1.9098983e-04, 8.6194178e-04, ..., 9.8885458e-08,
       1.0560690e-07, 1.1116919e-07], dtype=float32)}
{'class': 0, 'prob': array([9.9646854e-01, 7.3993037e-06, 1.6678940e-03, ..., 3.3662158e-08,
       3.7401023e-08, 3.9902886e-08], dtype=float32)}
{'class': 0, 'prob': array([9.9418157e-01, 2.2869966e-05, 7.2757481e-04, ..., 7.2877960e-08,
       8.5308180e-08, 8.7949694e-08], dtype=float32)}
{'class': 0, 'prob': array([9.8990846e-01, 2.0035572e-05, 5.0557905e-04, ..., 4.2098847e-08,
       4.6305473e-08, 4.8882491e-08], dtype=float32)}
{'class': 0, 'prob': array([9.3541616e-01, 1.6300696e-03, 2.8230180e-03, ..., 3.4934112e-07,
       3.5947951e-07, 3.8610020e-07], dtype=float32)}
{'class': 5, 'prob': array([4.5955207e-04, 3.9533910e-04, 2.9366053e-04, ..., 6.4991447e-08,
       6.5079021e-08, 6.8886770e-08], dtype=float32)}
{'class': 0, 'prob': array([9.2468429e-01, 4.9159536e-04, 9.2872838e-03, ..., 1.0636869e-06,
       1.1284576e-06, 1.1437518e-06], dtype=float32)}
{'class': 0, 'prob': array([9.5501184e-01, 2.6409564e-04, 3.8474586e-03, ..., 1.4077391e-06,
       1.4964197e-06, 1.4892942e-06], dtype=float32)}
{'class': 0, 'prob': array([9.4813752e-01, 2.7400412e-04, 2.2020808e-03, ..., 2.9592795e-06,
       3.0286824e-06, 3.0610188e-06], dtype=float32)}
{'class': 0, 'prob': array([9.6341538e-01, 3.4047980e-04, 2.0810752e-03, ..., 6.5900326e-07,
       6.7539651e-07, 7.0834898e-07], dtype=float32)}
{'class': 0, 'prob': array([9.9541759e-01, 7.4490390e-06, 3.9836011e-04, ..., 5.1197322e-08,
       5.6648332e-08, 5.9212919e-08], dtype=float32)}
{'class': 0, 'prob': array([9.9666804e-01, 1.2600798e-05, 3.1346193e-04, ..., 3.9119975e-08,
       4.3912351e-08, 4.7076494e-08], dtype=float32)}
{'class': 0, 'prob': array([9.9582565e-01, 2.3773579e-05, 5.5219355e-04, ..., 8.2924736e-08,
       9.1671566e-08, 9.3954029e-08], dtype=float32)}
{'class': 0, 'prob': array([9.4328243e-01, 1.5643415e-04, 3.1944674e-03, ..., 3.9115656e-07,
       4.2140312e-07, 4.4074648e-07], dtype=float32)}
{'class': 0, 'prob': array([9.9599898e-01, 1.3793178e-05, 6.0236652e-04, ..., 1.1893864e-07,
       1.3845128e-07, 1.4301372e-07], dtype=float32)}
{'class': 15, 'prob': array([1.8115035e-03, 1.0454544e-03, 2.0831774e-03, ..., 4.5647434e-06,
       5.0818121e-06, 4.9641203e-06], dtype=float32)}
{'class': 0, 'prob': array([9.9594927e-01, 9.6870117e-06, 3.7690319e-04, ..., 1.1332005e-07,
       1.2312253e-07, 1.3208249e-07], dtype=float32)}
{'class': 0, 'prob': array([9.4268161e-01, 7.6396938e-04, 3.4147443e-03, ..., 5.8237259e-07,
       5.8584078e-07, 5.9859877e-07], dtype=float32)}
{'class': 18, 'prob': array([1.2369211e-03, 7.1954611e-03, 3.4218519e-03, ..., 1.6767866e-06,
       1.5141470e-06, 1.5795833e-06], dtype=float32)}
{'class': 0, 'prob': array([9.9327940e-01, 2.4744159e-05, 8.3286857e-04, ..., 8.1387967e-08,
       9.2638246e-08, 9.4754824e-08], dtype=float32)}
{'class': 18, 'prob': array([4.3461438e-02, 7.7443835e-03, 1.0502382e-02, ..., 6.1044288e-06,
       6.4804617e-06, 6.6003668e-06], dtype=float32)}
{'class': 0, 'prob': array([9.1440409e-01, 2.1251327e-04, 1.9904026e-03, ..., 9.9065488e-08,
       1.0103827e-07, 1.0984956e-07], dtype=float32)}
{'class': 5, 'prob': array([4.2783137e-02, 1.3115143e-02, 1.6208552e-02, ..., 3.9897031e-06,
       3.9228212e-06, 4.1420644e-06], dtype=float32)}
{'class': 0, 'prob': array([9.0668356e-01, 6.9979503e-04, 4.9138898e-03, ..., 4.2717656e-07,
       4.3982755e-07, 4.7387920e-07], dtype=float32)}
{'class': 0, 'prob': array([9.3811822e-01, 1.6991694e-04, 2.0085643e-03, ..., 3.8740203e-07,
       4.0521365e-07, 4.3191656e-07], dtype=float32)}
{'class': 0, 'prob': array([9.5434970e-01, 2.1576983e-04, 2.3911290e-03, ..., 7.2219399e-07,
       7.4783770e-07, 7.9287622e-07], dtype=float32)}

大多数类还是0。为什么会这样呢？有什么替代方法可以帮助我吗？

Answer 1

最后，我得到了答案。模型已保存并正确加载。问题是我在保存/加载而不进行保存/加载的情况下传递给预测的x_test是不同的（我知道，对此错误我感到非常抱歉）。不保存/加载模型的x_test的值比不保存/加载x_test的值+1。一位tensorflow开发者在github上向我提出了这个建议，我已经在其中打开了issue。

Answer 2

我将从Edit 1开始。根据TF文档：

检查点频率   默认情况下，Estimator根据以下时间表将检查点保存在model_dir中：

每10分钟（600秒）写一个检查点。   当train方法开始（第一次迭代）并完成（最终迭代）时，编写一个检查点。   在目录中仅保留5个最近的检查点。   您可以按照以下步骤更改默认时间表：

创建一个tf.estimator.RunConfig对象，该对象定义所需的时间表。   实例化Estimator时，请将该RunConfig对象传递给Estimator的config参数。

您让火车正确完成了吗？根据日志，情况似乎并非如此（因为您还原了model-ckpt100而不是model-ckpt300）。

如果实验用时不长，建议您删除保存的模型的内容，或者让训练按照您在调用classifier.train时定义的步骤数完成，或者文档建议创建一个tf.estimator.RunConfig，您将其作为参数传递给estimator并决定何时保存。

希望这对您有帮助！

Answer 3

另一个错误预测的可能原因是忘记像训练/验证数据一样对输入进行标准化。

例如，如果您通过减去均值并除以方差来标准化训练集，而对要预测的数据没有做同样的事情，那么预测将是无稽之谈。发生这种情况的一个很好的线索是预测将是非常高或非常低的值。

加载保存的模型后得到错误的预测

3 个答案: