我正在弄清楚TensorFlow估算框架。我终于有了一个训练模型的代码。我正在使用一个简单的MNIST自动编码器进行测试。我有两个问题。第一个问题是为什么训练报告的步骤数量与我在估算器train()方法中指定的步骤数量不同?第二个是如何使用训练钩来做周期性评估,每X步丢失输出等事情?文档似乎说要使用训练钩子,但我似乎找不到任何关于如何使用这些钩子的实际例子。
这是我的代码:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import time
import shutil
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from IPython import display
from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('.')
display.clear_output()
def _model_fn(features, labels, mode=None, params=None):
# define inputs
image = tf.feature_column.numeric_column('images', shape=(784, ))
inputs = tf.feature_column.input_layer(features, [image, ])
# encoder
e1 = tf.layers.dense(inputs, 512, activation=tf.nn.relu)
e2 = tf.layers.dense(e1, 256, activation=tf.nn.relu)
# decoder
d1 = tf.layers.dense(e2, 512, activation=tf.nn.relu)
model = tf.layers.dense(d1, 784, activation=tf.nn.relu)
# training ops
loss = tf.losses.mean_squared_error(labels, model)
train = tf.train.AdamOptimizer().minimize(loss, global_step=tf.train.get_global_step())
if mode == tf.estimator.ModeKeys.TRAIN:
return tf.estimator.EstimatorSpec(mode=mode,
loss=loss,
train_op=train)
_train_input_fn = tf.estimator.inputs.numpy_input_fn({'images': data.train.images},
y=np.array(data.train.images),
batch_size=100,
shuffle=True)
shutil.rmtree("logs", ignore_errors=True)
tf.logging.set_verbosity(tf.logging.INFO)
estimator = tf.estimator.Estimator(_model_fn,
model_dir="logs",
config=tf.contrib.learn.RunConfig(save_checkpoints_steps=1000),
params={})
estimator.train(_train_input_fn, steps=1000)
这是我得到的输出(注意训练在550步骤停止,代码明确要求1000)
INFO:tensorflow:Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x12b9fa630>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_secs': None, '_session_config': None, '_save_checkpoints_steps': 1000, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': 'logs'}
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into logs/model.ckpt.
INFO:tensorflow:loss = 0.102862, step = 1
INFO:tensorflow:global_step/sec: 41.8119
INFO:tensorflow:loss = 0.0191228, step = 101 (2.393 sec)
INFO:tensorflow:global_step/sec: 39.9923
INFO:tensorflow:loss = 0.0141014, step = 201 (2.500 sec)
INFO:tensorflow:global_step/sec: 40.9806
INFO:tensorflow:loss = 0.0116138, step = 301 (2.440 sec)
INFO:tensorflow:global_step/sec: 40.0043
INFO:tensorflow:loss = 0.00998991, step = 401 (2.500 sec)
INFO:tensorflow:global_step/sec: 39.2571
INFO:tensorflow:loss = 0.0124132, step = 501 (2.548 sec)
INFO:tensorflow:Saving checkpoints for 550 into logs/model.ckpt.
INFO:tensorflow:Loss for final step: 0.00940801.
<tensorflow.python.estimator.estimator.Estimator at 0x12b9fa780>
更新#1 我找到了第一个问题的答案。训练在步骤550停止的原因是因为numpy_input_fn()默认为num_epochs = 1。我仍在寻找训练钩子的帮助。
答案 0 :(得分:0)
看来您已经完成了550步的所有数据处理。 numpy_input_fn的默认“ num_epochs”参数为1,因此您只能运行一次数据。 https://www.tensorflow.org/api_docs/python/tf/estimator/inputs/numpy_input_fn
因此,您应该将num_epochs设置为None来满足您的步骤。
答案 1 :(得分:0)
估算器可以在3种模式下运行。
您当前的代码仅配置为在训练模式下运行。如果要包括评估步骤,则必须首先对模型函数进行一些更改:
def _model_fn(features, labels, mode=None, params=None):
# define inputs
image = tf.feature_column.numeric_column('images', shape=(784, ))
inputs = tf.feature_column.input_layer(features, [image, ])
# encoder
e1 = tf.layers.dense(inputs, 512, activation=tf.nn.relu)
e2 = tf.layers.dense(e1, 256, activation=tf.nn.relu)
# decoder
d1 = tf.layers.dense(e2, 512, activation=tf.nn.relu)
model = tf.layers.dense(d1, 784, activation=tf.nn.relu)
# training ops
loss = tf.losses.mean_squared_error(labels, model)
train = tf.train.AdamOptimizer().minimize(loss, global_step=tf.train.get_global_step())
if mode == tf.estimator.ModeKeys.TRAIN:
return tf.estimator.EstimatorSpec(mode=mode,
loss=loss,
train_op=train)
prec, prec_update_op = tf.metrics.precision(labels=labels,predictions=model), name='precision_op')
recall, recall_update_op = tf.metrics.recall(labels=labels, predictions=model, name='recall_op')
metrics={'recall':(recall, recall_update_op), \
'precision':(prec, prec_update_op)}
if mode==tf.estimator.ModeKeys.EVAL:
return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=metrics)
现在每10步进行评估和打印损失输出。
configuration = tf.estimator.RunConfig(
model_dir = 'logs',
keep_checkpoint_max=5,
save_checkpoints_steps=1500,
log_step_count_steps=10) # set the frequency of logging steps for loss function
estimator = tf.estimator.Estimator(model_fn = _model_fn, params = {}, config=configuration)
train_spec = tf.estimator.TrainSpec(input_fn=_train_input_fn, steps=5000)
eval_spec = tf.estimator.EvalSpec(input_fn=_train_input_fn, steps=100, throttle_secs=600)
tf.estimator.train_and_evaluate(classifier, train_spec, eval_spec)
注意:
以上内容将在同一个数据集上进行训练和评估,如果您希望在另一个数据集上完成该操作,然后将其(数据集的)合适的输入函数传递给 input_fn > tf.estimator.EvalSpec