无法使用DNNRegressor进行增量训练

时间:2019-09-26 02:53:49

标签: tensorflow2.0

我尝试根据Google的课程编写一个学习案例,该课程使用DNNRegressor设置神经网络(intro_to_neural_nets)。但是执行脚本时出现错误:

...
File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 662, in iterations
    raise RuntimeError("Cannot set `iterations` to a new Variable after "
RuntimeError: Cannot set `iterations` to a new Variable after the Optimizer weights have been created

在我的代码中,我按照示例将步骤分为多个阶段来执行,并且代码如下:

def training(learning_rate, steps, batch_size, hidden_units, samples, targets, test_samples, test_targets, periods = 10):
  steps_per_period = steps / periods

  #create DNNRegressor Object
  my_optimizer = tf.optimizers.SGD(learning_rate=learning_rate, momentum=0.9, clipnorm=5.0)
  dnn_regressor = tf.estimator.DNNRegressor(
    feature_columns = construct_feature_columns(samples),
    hidden_units = hidden_units,
    optimizer = my_optimizer
  )

  # Create input functions.
  training_input_fn = lambda: input_fn(samples, 
                                          targets, 
                                          batch_size=batch_size)
  predict_training_input_fn = lambda: input_fn(samples, 
                                                  targets, 
                                                  num_epochs=1, 
                                                  shuffle=False)
  predict_validation_input_fn = lambda: input_fn(test_samples, 
                                                    test_targets, 
                                                    num_epochs=1, 
                                                    shuffle=False)
  # Train the model, but do so inside a loop so that we can periodically assess
  # loss metrics.
  print("Training model...")
  print("RMSE (on training data):")
  training_rmse = []
  validation_rmse = []
  for period in range (0, periods):
    # Train the model, starting from the prior state.
    print("Period[%s]" % (period+1))
    dnn_regressor.train(
        input_fn=training_input_fn,
        steps=steps_per_period
    )
...

第一个周期成功执行,但在第二个迭代中却以较大的错误失败并跳出。

我再次添加了即时训练动作来测试是否还有其他步骤导致此问题,但是这表明问题出在这里(再次调用训练步骤)

#changed code
    print("Period[%s]" % (period+1))
    dnn_regressor.train(
        input_fn=training_input_fn,
        steps=steps_per_period
    )
    print("--- again")
    dnn_regressor.train(
        input_fn=training_input_fn
    )

有输出

Training model...
RMSE (on training data):
Period[1]
WARNING:tensorflow:From /~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:550: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From /~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/ops/clip_ops.py:172: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/model_fn.py:337: scalar (from tensorflow.python.framework.tensor_shape) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.TensorShape([]).
2019-09-26 10:27:41.728179: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-26 10:27:41.742511: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe4f6546af0 executing computations on platform Host. Devices:
2019-09-26 10:27:41.742564: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
--- again
Traceback (most recent call last):
  File "/~/Documents/workspace/tensorflow/intro_to_neural_nets.py", line 174, in <module>
    test_targets=test_Y)
  File "/~/Documents/workspace/tensorflow/intro_to_neural_nets.py", line 123, in training
    input_fn=training_input_fn
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/canned/dnn.py", line 1166, in _model_fn
    batch_norm=batch_norm)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/canned/dnn.py", line 580, in dnn_model_fn_v2
    optimizer.iterations = training_util.get_or_create_global_step()
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 561, in __setattr__
    super(OptimizerV2, self).__setattr__(name, value)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 662, in iterations
    raise RuntimeError("Cannot set `iterations` to a new Variable after "
RuntimeError: Cannot set `iterations` to a new Variable after the Optimizer weights have been created

我不知道为什么会发生此错误以及如何解决。感谢任何人的帮助。顺便说一下,如果有人可以告诉我如何避免/消除这些警告,我们也将不胜感激。

2 个答案:

答案 0 :(得分:2)

我认为您在这里没有做错任何事情。我使用固定估计器在TensorFlow文档中编辑了该示例,但无法使用任何tf.keras.optimizer()进行多个estimator.train(...)调用进行训练。您的代码可能会在未指定优化器的情况下运行,但是我不确定在这种情况下使用的学习率或优化器...

我只是在TF github上将其作为问题打开。看到这里更新: https://github.com/tensorflow/tensorflow/issues/33358

如果您想立即开始使用,可以将代码降级为TF 1.x,以便大致匹配Google机器学习速成课程的版本。

如果您有更大的野心,TF团队建议开始与Keras学习TensorFlow。在有关预制估算器的文档页面中:

  

请注意,在TensorFlow 2.0中,Keras API可以完成许多相同的任务,并且被认为是更容易学习的API。如果您是从头开始,我们建议您从Keras开始。有关TensorFlow 2.0中可用的高级API的更多信息,请参见Standardizing on Keras

编辑:监视培训的一种可能很省力的方法是使用tensorboard。对您的代码所做的更改将是:

  1. 删除循环。

  2. 添加model_dir参数以查找日志。

dnn_regressor = tf.estimator.DNNRegressor(
    feature_columns = construct_feature_columns(samples),
    hidden_units = hidden_units,
    optimizer = my_optimizer,
    model_dir = /tmp/log_dir
  )
  1. 使用以下命令打开TensorBoard(可能不需要reload_multifile选项):
%load_ext tensorboard
%tensorboard --logdir '/tmp/log_dir' --reload_multifile=true

TensorBoard默认情况下每30秒更新一次,但是如果您想更紧密地监视训练,则可以更新得更快。如果您想更详细地探索模型的外观,此工具也非常酷!

编辑2:在github上向我建议了一个简单的解决方法。这通过创建可调用对象而不是optimizer实例传递给Estimator来工作。使用callable可以在每次调用Estimator.train()时创建一个新实例,因此避免了尝试在现有iterations上设置Optimizer的问题。

from functools import partial

my_optimizer = partial(SGD, learning_rate=leraning_rate, momentum=0.9, clipnorm=5.0)

答案 1 :(得分:0)

我尝试过:

from functools import partial

my_optimizer = partial(SGD, learning_rate=leraning_rate, momentum=0.9, clipnorm=5.0)

它以这种方式为我工作:

from functools import partial

my_optimizer = partial(optimizers.SGD, learning_rate=learning_rate, momentum=0.9, clipnorm=5.0)