功能中的嵌套渐变带(TF2.0)

时间:2019-11-14 12:28:01

标签: python tensorflow machine-learning deep-learning tensorflow2.0

我尝试实施MAML。因此,我需要一步一步地训练模型的副本(model_copy), 那么我就需要对我的meta_model进行培训,同时丢失了我的model_copy。

我想训练一个函数中的model_copy。 如果我将代码复制到函数中,则不会得到正确的gradients_meta(它们全都不是)。

似乎图形未连接-如何连接图形?

任何关于我做错事情的想法吗?我看到了很多变量,但这似乎没有什么区别。

以下是重现此问题的代码:

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.backend as keras_backend


def copy_model(model):
    copied_model = keras.Sequential()
    copied_model.add(keras.layers.Dense(5, input_shape=(1,)))
    copied_model.add(keras.layers.Dense(1))
    copied_model.set_weights(model.get_weights())
    return copied_model


def compute_loss(model, x, y):
    logits = model(x)  # prediction of my model
    mse = keras_backend.mean(keras.losses.mean_squared_error(y, logits))  # compute loss between prediciton and label/truth
    return mse, logits


# meta_model to learn in outer gradient tape
meta_model = keras.Sequential()
meta_model.add(keras.layers.Dense(5, input_shape=(1,)))
meta_model.add(keras.layers.Dense(1))

# optimizer for training
optimizer = keras.optimizers.Adam()


# function to calculate model_copys params
def do_calc(x, y, meta_model):
    with tf.GradientTape() as gg:
        model_copy = copy_model(meta_model)
        gg.watch(x)
        gg.watch(meta_model.trainable_variables)
        gg.watch(model_copy.trainable_variables)
        loss, _ = compute_loss(model_copy, x, y)
        gradient = gg.gradient(loss, model_copy.trainable_variables)
        optimizer.apply_gradients(zip(gradient, model_copy.trainable_variables))
        return model_copy


# inputs for training
x = tf.constant(3.0, shape=(1, 1, 1))
y = tf.constant(3.0, shape=(1, 1, 1))

with tf.GradientTape() as g:

    g.watch(x)
    g.watch(y)

    model_copy = do_calc(x, y, meta_model)
    g.watch(model_copy.trainable_variables)
    # calculate loss of model_copy
    test_loss, _ = compute_loss(model_copy, x, y)
    # build gradients for meta_model update
    gradients_meta = g.gradient(test_loss, meta_model.trainable_variables)
    # gradients always None !?!!11 elf
    optimizer.apply_gradients(zip(gradients_meta, meta_model.trainable_variables))

在此先感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

我找到了解决方案: 我需要以某种方式“连接”元模型和模型复制。

任何人都可以解释为什么这样做有效,以及如何使用“适当的”优化程序来实现这一目标吗?

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.backend as keras_backend


def copy_model(model):
    copied_model = keras.Sequential()
    copied_model.add(keras.layers.Dense(5, input_shape=(1,)))
    copied_model.add(keras.layers.Dense(1))
    copied_model.set_weights(model.get_weights())
    return copied_model


def compute_loss(model, x, y):
    logits = model(x)  # prediction of my model
    mse = keras_backend.mean(keras.losses.mean_squared_error(y, logits))  # compute loss between prediciton and label/truth
    return mse, logits


# meta_model to learn in outer gradient tape
meta_model = keras.Sequential()
meta_model.add(keras.layers.Dense(5, input_shape=(1,)))
meta_model.add(keras.layers.Dense(1))

# optimizer for training
optimizer = keras.optimizers.Adam()


# function to calculate model_copys params
def do_calc(meta_model, x, y, gg, alpha=0.01):
    model_copy = copy_model(meta_model)
    loss, _ = compute_loss(model_copy, x, y)
    gradients = gg.gradient(loss, model_copy.trainable_variables)
    k = 0
    for layer in range(len(model_copy.layers)):
        # calculate adapted parameters w/ gradient descent
        # \theta_i' = \theta - \alpha * gradients
        model_copy.layers[layer].kernel = tf.subtract(meta_model.layers[layer].kernel,
                                                      tf.multiply(alpha, gradients[k]))
        model_copy.layers[layer].bias = tf.subtract(meta_model.layers[layer].bias,
                                                    tf.multiply(alpha, gradients[k + 1]))
        k += 2
    return model_copy


with tf.GradientTape() as g:
    # inputs for training
    x = tf.constant(3.0, shape=(1, 1, 1))
    y = tf.constant(3.0, shape=(1, 1, 1))
    adapted_models = []

    # model_copy = meta_model
    with tf.GradientTape() as gg:
        model_copy = do_calc(meta_model, x, y, gg)

    # calculate loss of model_copy
    test_loss, _ = compute_loss(model_copy, x, y)
    # build gradients for meta_model update
    gradients_meta = g.gradient(test_loss, meta_model.trainable_variables)
    # gradients work. Why???
    optimizer.apply_gradients(zip(gradients_meta, meta_model.trainable_variables))