专家混合 - 仅在每次迭代时训练最佳模型

时间:2017-02-08 09:44:28

标签: python tensorflow

我正在尝试在 tensorflow - https://arxiv.org/abs/1701.06538

中基于专家混合论文实施粗略方法

将定义n个模型:

    model_1:
        var_11
        var_12
        loss_1
        optimizer_1

    model_2:
        var_21
        var_22
        loss_2
        optimizer_2

    model_3:
        var_31
        var_32
        loss_3
        optimizer_3

在每次迭代中,我都希望仅在保持其他变量不变的情况下训练具有最小损失的模型。是否可以放置一个开关来执行一个优化器?

P.S:这个问题的基础类似于我之前提出的问题。 http://stackoverflow.com/questions/42073239/tf-get-collection-to-extract-variables-of-one-scope/42074009?noredirect=1#comment71359330_42074009

由于建议不起作用,我试图以不同的方式解决问题。

提前致谢!

1 个答案:

答案 0 :(得分:4)

这似乎适用于tf.cond

import tensorflow as tf

def make_conditional_train_op(
    should_update, optimizers, variable_lists, losses):
  """Conditionally trains variables.

  Each argument is a Python list of Tensors, and each list must have the same
  length. Variables are updated based on their optimizer only if the
  corresponding `should_update` boolean Tensor is True at a given step.

  Returns a single train op which performs the conditional updates.
  """
  assert len(optimizers) == len(variable_lists)
  assert len(variable_lists) == len(losses)
  assert len(should_update) == len(variable_lists)
  conditional_updates = []
  for model_number, (update_boolean, optimizer, variables, loss) in enumerate(
      zip(should_update, optimizers, variable_lists, losses)):
    conditional_updates.append(
        tf.cond(update_boolean,
                lambda: tf.group(
                    optimizer.minimize(loss, var_list=variables),
                    tf.Print(0, ["Model {} updating".format(model_number), loss])),
                lambda: tf.no_op()))
  return tf.group(*conditional_updates)

基本策略是确保优化器的变量更新在其中一个lambda分支的cond中定义,在这种情况下,存在真正的条件操作,这意味着只有在cond的那个分支被触发时才会发生对变量(和优化器累加器)的赋值。

作为一个例子,我们可以构建一些模型:

def make_model_and_optimizer():
  scalar_variable = tf.get_variable("scalar", shape=[])
  vector_variable = tf.get_variable("vector", shape=[3])
  loss = tf.reduce_sum(scalar_variable * vector_variable)
  optimizer = tf.train.AdamOptimizer(0.1)
  return optimizer, [scalar_variable, vector_variable], loss

# Construct each model
optimizers = []
variable_lists = []
losses = []
for i in range(10):
  with tf.variable_scope("model_{}".format(i)):
    optimizer, variables, loss = make_model_and_optimizer()
  optimizers.append(optimizer)
  variable_lists.append(variables)
  losses.append(loss)

然后确定条件更新策略,在这种情况下只训练具有最大损失的模型(仅因为这会导致更多切换;如果只有一个模型更新,则输出相当无聊):

# Determine which model should be updated (in this case, the one with the
# maximum loss)
integer_one_hot = tf.one_hot(
    tf.argmax(tf.stack(losses),
              axis=0),
    depth=len(losses))
is_max = tf.equal(
    integer_one_hot,
    tf.ones_like(integer_one_hot))

最后,我们可以调用make_conditional_train_op函数来创建训练操作,然后进行一些训练迭代:

train_op = make_conditional_train_op(
    tf.unstack(is_max), optimizers, variable_lists, losses)

# Repeatedly call the conditional train op
with tf.Session():
  tf.global_variables_initializer().run()
  for i in range(20):
    print("Iteration {}".format(i))
    train_op.run()

这是打印更新的索引及其在每次迭代时的丢失,确认条件执行:

Iteration 0
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][2.7271919]
Iteration 1
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][2.1755948]
Iteration 2
I tensorflow/core/kernels/logging_ops.cc:79] [Model 2 updating][1.9858969]
Iteration 3
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][1.6859927]