我在使用tfp.layers.DistributionLambda
时遇到问题,我是TF新手,正在努力使张量流动。 有人可以提供一些有关如何设置输出分布的参数的见解吗?
TFP团队在Regression with Probabilistic Layers in TensorFlow Probability上写了一个教程,它建立了以下模型:
# Build model.
model = tfk.Sequential([
tf.keras.layers.Dense(1 + 1),
tfp.layers.DistributionLambda(
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[..., 1:]))),
])
它使用tfp.layers.DistributionLambda输出正态分布,但我不清楚tfd.Normal的参数(均值/位置和标准差/小数位数)是如何设置的,因此我无法将Normal更改为a伽玛分布。我尝试了以下操作,但没有成功(预测的分发参数为nan)。
def dist_output_layer (t, softplus_scale=0.05):
"""Create distribution with variable mean and variance
"""
mean = t[..., :1]
std_dev = 1e-3 + tf.math.softplus(softplus_scale * mean)
alpha = (mean/std_dev)**2
beta = alpha/mean
return tfd.Gamma(concentration = alpha,
rate = beta
)
# Build model.
model = tf.keras.Sequential([
tf.keras.layers.Dense(20,activation="relu"), # "By using a deeper neural network and introducing nonlinear activation functions, however, we can learn more complicated functional dependencies!
tf.keras.layers.Dense(1 + 1), #two neurons here b/c the output layer's distribution's mean and std. deviation
tfp.layers.DistributionLambda(dist_output_layer)
])
非常感谢。
答案 0 :(得分:2)
老实说,关于你从 Medium 粘贴的代码片段有很多话要说。
不过,我希望你会发现我下面的评论有些用处。
# Build model.
model = tfk.Sequential([
# The first layer is a Dense layer with 2 units, one for each of the parameters that will
# be learnt (see next layer). Its implied shape is (batch_size, 2).
# Note that this Dense layer has no activation function as we want are any real value that will be used
# to parameterize the Normal distribution in the Normal distribution component of the following
# layer
tf.keras.layers.Dense(1 + 1),
# The following layer is a DistributionLambda that encapsulates a Normal distribution. The
# DistributionLambda takes a function in its constructor, and this function should take the output
# tensor from the previous layer as its input (this is the Dense layer and the comments above).
# The goal is to learn the 2 parameters of the distribution that is loc (the mean) and scale (the standard
# deviation). For this, a lambda construct is used. The ellipsis you can see for the loc
# and scale arguments (that is the 3 dots) are for the batch size. Also note that scale (the standard deviation)
# cannot be negative. The softplus function was used to make sure that the learnt parameter scale doesn't get
# negative.
tfp.layers.DistributionLambda(
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[..., 1:]))),
])
答案 1 :(得分:1)
关于添加 .05 的问题,这是解决一些没有它可能出现的梯度问题的一个小补偿。基本上之前说过,我们确信实际可变性不小于 epsilon(此处为 0.05),因此我们将通过添加它来确保 std dev 永远不会更小。
见https://github.com/tensorflow/probability/issues/751
货币报价:
“如果无穷小的尺度最终成为给定任务的实践问题,我们通常使用的解决方法是软加和移位,例如 scale = epsilon + tf.math.softplus(unconstrained_scale),其中 epsilon 是一些我们先验确信的像 1e-5 这样的微小值远小于真实比例。”
编辑:由于我上面描述的原因,实际上添加的是 1e-3。至于乘法......可能再次只是缩放或梯度调整。或者让比例参数从某个大小开始。