使用Keras / Tensorflow或autograd计算验证误差w.r.t输入的梯度

时间:2020-10-06 16:27:35

标签: python-3.x keras tensorflow2.0 autograd

我需要计算验证误差的斜率w.r.t输入x。我试图查看扰动训练样本之一时验证错误的变化。

  • 验证错误(E)明确取决于模型权重(W)。
  • 模型权重明确取决于输入(x和y)。
  • 因此,验证错误隐含地取决于输入。

我正在尝试直接计算 E w.r.t x 的梯度。 另一种方法是计算 E wrt W 的梯度(可以轻松计算)和 W wrt x的梯度(目前无法执行),这样可以计算出 E wrt x 的梯度。

我附上了一个玩具的例子。预先感谢!

import numpy as np
import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
from autograd import grad

train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

# Flatten the images.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))

# Build the model.
model = Sequential([
    Dense(64, activation='relu', input_shape=(784,)),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)

# Train the model.
model.fit(
    train_images,
    to_categorical(train_labels),
    epochs=5,
    batch_size=32,
)
model.save_weights('model.h5')
# Load the model's saved weights.
# model.load_weights('model.h5')

calculate_mse = tf.keras.losses.MeanSquaredError()

test_x = test_images[:5]
test_y = to_categorical(test_labels)[:5]

train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]

train_y = tf.convert_to_tensor(train_y, np.float32)
train_x = tf.convert_to_tensor(train_x, np.float64)

with tf.GradientTape() as tape:
    tape.watch(train_x)
    model.fit(train_x, train_y, epochs=1, verbose=0)
    valid_y_hat = model(test_x, training=False)
    mse = calculate_mse(test_y, valid_y_hat)
de_dx = tape.gradient(mse, train_x)
print(de_dx)


# approach 2 - does not run
def calculate_validation_mse(x):
    model.fit(x, train_y, epochs=1, verbose=0)
    valid_y_hat = model(test_x, training=False)
    mse = calculate_mse(test_y, valid_y_hat)
    return mse


train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]

validation_gradient = grad(calculate_validation_mse)
de_dx = validation_gradient(train_x)
print(de_dx)


1 个答案:

答案 0 :(得分:0)

这是您可以执行的操作。推导如下。

enter image description here

没什么要注意的,

  • 由于colab的内存不足(代码中标记的行),我已将功能大小从784减小到256。可能必须进行一些内存分析,以找出原因
  • 仅计算第一层的grad。轻松扩展到其他层

免责声明:据我所知,此推导是正确的。请进行一些研究并验证是否确实如此。对于较大的输入和更大的层,您将遇到内存问题。

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf

f = 256

model = Sequential([
    Dense(64, activation='relu', input_shape=(f,)),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)
w = model.weights[0]

# Inputs and labels
x_tr = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
y_tr = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
y_tr_onehot = tf.keras.utils.to_categorical(y_tr, num_classes=10).astype('float32')
x_v = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
y_v = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
y_v_onehot = tf.keras.utils.to_categorical(y_v, num_classes=10).astype('float32')

# In the context of GradientTape

with tf.GradientTape() as tape1:

  with tf.GradientTape() as tape2:
    y_tr_pred = model(x_tr)   
    tr_loss = tf.keras.losses.MeanSquaredError()(y_tr_onehot, y_tr_pred)

  tmp_g = tape2.gradient(tr_loss, w)
  print(tmp_g.shape)

# d(dE_tr/d(theta))/dx
# Warning this step consumes lot of memory for large layers
lr = 0.001
grads_1 = -lr * tape1.jacobian(tmp_g, x_tr)

with tf.GradientTape() as tape3:
  y_v_pred = model(x_v)   
  v_loss = tf.keras.losses.MeanSquaredError()(y_v_onehot, y_v_pred)

# dE_val/d(theta)
grads_2 = tape3.gradient(v_loss, w)[tf.newaxis, :]

# Just crunching the dimension to get the final desired shape of (1,256)
grad = tf.matmul(tf.reshape(grads_2,[1, -1]), tf.reshape(tf.transpose(grads_1,[2,1,0,3]),[1, -1, 256]))