在tensorflow-2.0中,我试图创建一个keras.layers.Layer
,它输出两个tensorflow_probability.distributions
之间的Kullback-Leibler(KL)散度。我想计算输出相对于tensorflow_probability.distributions
之一的平均值的梯度(即KL散度)。
不幸的是,到目前为止,在我所有的尝试中,最终的梯度都是0
。
我尝试实现以下所示的最小示例。我想知道问题是否可能与tf 2
的急切执行模式有关,正如我所知道的在tf 1
中工作的类似方法一样,默认情况下,急切执行是禁用的。
这是我尝试过的最小示例:
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Layer,Input
# 1 Define Layer
class test_layer(Layer):
def __init__(self, **kwargs):
super(test_layer, self).__init__(**kwargs)
def build(self, input_shape):
self.mean_W = self.add_weight('mean_W',trainable=True)
self.kernel_dist = tfp.distributions.MultivariateNormalDiag(
loc=self.mean_W,
scale_diag=(1.,)
)
super(test_layer, self).build(input_shape)
def call(self,x):
return tfp.distributions.kl_divergence(
self.kernel_dist,
tfp.distributions.MultivariateNormalDiag(
loc=self.mean_W*0.,
scale_diag=(1.,)
)
)
# 2 Create model
x = Input(shape=(3,))
fx = test_layer()(x)
test_model = Model(name='test_random', inputs=[x], outputs=[fx])
# 3 Calculate gradient
print('\n\n\nCalculating gradients: ')
# example data, only used as a dummy
x_data = np.random.rand(99,3).astype(np.float32)
for x_now in np.split(x_data,3):
# print(x_now.shape)
with tf.GradientTape() as tape:
fx_now = test_model(x_now)
grads = tape.gradient(
fx_now,
test_model.trainable_variables,
)
print('\nKL-Divergence: ', fx_now, '\nGradient: ',grads,'\n')
print(test_model.summary())
上面代码的输出是
Calculating gradients:
KL-Divergence: tf.Tensor(0.0029436834, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=237, shape=(), dtype=float32, numpy=0.0>]
KL-Divergence: tf.Tensor(0.0029436834, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=358, shape=(), dtype=float32, numpy=0.0>]
KL-Divergence: tf.Tensor(0.0029436834, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=479, shape=(), dtype=float32, numpy=0.0>]
Model: "test_random"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 3)] 0
_________________________________________________________________
test_layer_3 (test_layer) () 1
=================================================================
Total params: 1
Trainable params: 1
Non-trainable params: 0
_________________________________________________________________
None
可正确计算KL散度,但最终的梯度为0
。获取梯度的正确方法是什么?
答案 0 :(得分:1)
如果有人感兴趣,我就知道如何解决这个问题:
行
self.kernel_dist = tfp.distributions.MultivariateNormalDiag(
loc=self.mean_W,
scale_diag=(1.,)
)
不应位于图层类定义的build()
-方法内,而应位于call()
方法内。这是修改后的示例:
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Layer,Input
# 1 Define Layer
class test_layer(Layer):
def __init__(self, **kwargs):
super(test_layer, self).__init__(**kwargs)
def build(self, input_shape):
self.mean_W = self.add_weight('mean_W',trainable=True)
super(test_layer, self).build(input_shape)
def call(self,x):
self.kernel_dist = tfp.distributions.MultivariateNormalDiag(
loc=self.mean_W,
scale_diag=(1.,)
)
return tfp.distributions.kl_divergence(
self.kernel_dist,
tfp.distributions.MultivariateNormalDiag(
loc=self.mean_W*0.,
scale_diag=(1.,)
)
)
# 2 Create model
x = Input(shape=(3,))
fx = test_layer()(x)
test_model = Model(name='test_random', inputs=[x], outputs=[fx])
# 3 Calculate gradient
print('\n\n\nCalculating gradients: ')
# example data, only used as a dummy
x_data = np.random.rand(99,3).astype(np.float32)
for x_now in np.split(x_data,3):
# print(x_now.shape)
with tf.GradientTape() as tape:
fx_now = test_model(x_now)
grads = tape.gradient(
fx_now,
test_model.trainable_variables,
)
print('\nKL-Divergence: ', fx_now, '\nGradient: ',grads,'\n')
print(test_model.summary())
现在的输出是
Calculating gradients:
KL-Divergence: tf.Tensor(0.024875917, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=742, shape=(), dtype=float32, numpy=0.22305119>]
KL-Divergence: tf.Tensor(0.024875917, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=901, shape=(), dtype=float32, numpy=0.22305119>]
KL-Divergence: tf.Tensor(0.024875917, shape=(), dtype=float32)
Gradient: [<tf.Tensor: id=1060, shape=(), dtype=float32, numpy=0.22305119>]
Model: "test_random"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 3)] 0
_________________________________________________________________
test_layer_1 (test_layer) () 1
=================================================================
Total params: 1
Trainable params: 1
Non-trainable params: 0
_________________________________________________________________
None
符合预期。
这是否已从tensorflow 1
更改为tensorflow 2
?
答案 1 :(得分:1)
我们正在研究分布和双射数,使它们易于结束构造函数中的变量。 (尚未完成MVN。)同时,您可以使用tfd.Independent(tfd.Normal(loc=self.mean_W, scale=1), reinterpreted_batch_ndims=1)
,因为我们已经改编了Normal
,因此我认为它可以在您的构建方法中使用。
还:您看过tfp.layers包了吗?特别是https://www.tensorflow.org/probability/api_docs/python/tfp/layers/KLDivergenceAddLoss对您来说可能很有趣。