Question

我正在尝试基于此实现（https://www.katnoria.com/mdn/）使用 LSTM + Mixture Density Network 制作下一个词预测模型。

输入：300维词向量*窗口大小(5)和表示文档主题分布的21维数组(c)，用于训练隐藏初始状态。

输出：混合系数*num_gaussians，方差*num_gaussians，均值*num_gaussians*300（向量大小）

x.shape, y.shape, c.shape 实验性 161 次观察结果如下：

(TensorShape([161, 5, 300]), TensorShape([161, 300]), TensorShape([161, 21]))

from tensorflow.keras.layers import Input, Dense, LSTM, Lambda
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.math import exp

# n_feat is size of word vector
n_feat = 300
window = 5
l = (window, n_feat)
hidden_state_dim = 21

# Number of gaussians to represent the multimodal distribution
k = 26

# Initial
mlp_inp = Input(shape=(hidden_state_dim,))
mlp_dense_h = Dense(128, activation='relu', name="dense_h")(mlp_inp)
mlp_dense_c = Dense(128, activation='relu', name="dense_c")(mlp_inp)

# Network
input = Input(shape=l)
layer1 = LSTM(128, return_sequences=True, name='baselayer1')(input, initial_state=[mlp_dense_h, mlp_dense_c])
layer2 = LSTM(128, name='baselayer2')(layer1)

# Mean
mu = Dense((n_feat * k), activation=None, name='mean_layer')(layer2)
# variance (should be greater than 0 so we exponentiate it)
var_layer = Dense(k, activation=None, name='dense_var_layer')(layer2)
var = Lambda(lambda x: exp(x), output_shape=(k,), name='variance_layer')(var_layer)
# mixing coefficient should sum to 1.0
pi = Dense(k, activation='softmax', name='pi_layer')(layer2)

下面是我的模型的 .summary()

Model: "model_12"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_7 (InputLayer)            [(None, 21)]         0                                            
__________________________________________________________________________________________________
input_8 (InputLayer)            [(None, 5, 300)]     0                                            
__________________________________________________________________________________________________
dense_h (Dense)                 (None, 128)          2816        input_7[0][0]                    
__________________________________________________________________________________________________
dense_c (Dense)                 (None, 128)          2816        input_7[0][0]                    
__________________________________________________________________________________________________
baselayer1 (LSTM)               (None, 5, 128)       219648      input_8[0][0]                    
                                                                 dense_h[0][0]                    
                                                                 dense_c[0][0]                    
__________________________________________________________________________________________________
baselayer2 (LSTM)               (None, 128)          131584      baselayer1[0][0]                 
__________________________________________________________________________________________________
dense_var_layer (Dense)         (None, 26)           3354        baselayer2[0][0]                 
__________________________________________________________________________________________________
pi_layer (Dense)                (None, 26)           3354        baselayer2[0][0]                 
__________________________________________________________________________________________________
mean_layer (Dense)              (None, 7800)         1006200     baselayer2[0][0]                 
__________________________________________________________________________________________________
variance_layer (Lambda)         (None, 26)           0           dense_var_layer[0][0]            
==================================================================================================
Total params: 1,369,772
Trainable params: 1,369,772
Non-trainable params: 0
__________________________________________________________________________________________________

但是，当我尝试运行训练过程时，出现以下错误

ValueError: in user code:

    <ipython-input-70-084e2be19035>:7 train_step  *
        loss = mdn_loss(y, pi_, mu_, var_)
    <ipython-input-67-9a3cf3d4ccd2>:18 mdn_loss  *
        out = calc_pdf(y_true, mu, var)
    <ipython-input-67-9a3cf3d4ccd2>:6 calc_pdf  *
        value = tf.subtract(y, mu)**2
.....
ValueError: Dimensions must be equal, but are 300 and 7800 for '{{node Sub}} = Sub[T=DT_FLOAT](y, model_15/mean_layer/BiasAdd)' with input shapes: [161,300], [161,7800].

它告诉我在 calc_pdf() 中使用的 tf.subtract() 中指定的变量的维度有问题，

# Take a note how easy it is to write the loss function in 
# new tensorflow eager mode (debugging the function becomes intuitive too)

def calc_pdf(y, mu, var):
    """Calculate component density"""
    value = tf.subtract(y, mu)**2
    value = (1/tf.math.sqrt(2 * np.pi * var)) * tf.math.exp((-1/(2*var)) * value)
    return value


def mdn_loss(y_true, pi, mu, var):
    """MDN Loss Function
    The eager mode in tensorflow 2.0 makes is extremely easy to write 
    functions like these. It feels a lot more pythonic to me.
    """
    out = calc_pdf(y_true, mu, var)
    # multiply with each pi and sum it
    out = tf.multiply(out, pi)
    out = tf.reduce_sum(out, 1, keepdims=True)
    out = -tf.math.log(out + 1e-10)
    return tf.reduce_mean(out)

但我不明白如何解决这个问题。我用 4000 个观察值、1 个特征和 26 个分布检查了原始实现（在上面的链接中），这些分布具有特定函数的维度 [4000, 1], [4000, 26]，并且工作正常。我觉得它应该也适用于 [161,300]、[161,7800]，但事实并非如此。

我该如何解决这个问题？

（我已经检查了有关“维度必须相等”的类似问题，但无法弄清楚如何使此特定实现工作。）

如果不够，我可以发布其他信息或代码，非常感谢您的回答！

Answer 1

对于 MDN 模型，必须使用所有高斯 pdf 计算每个样本的可能性，为此我认为您必须重塑矩阵（y_true 和 mu）并通过添加 1 作为广播操作来利用最后一个维度。例如：

def calc_pdf(y, mu, var):
   
    """Calculate component density"""
   y = tf.reshape(y , (161,300,1))
   mu =  tf.reshape(mu ,(161,300,26))
   value = tf.subtract(y, mu)**2

Tensorflow ValueError：维度必须相等：LSTM+MDN

1 个答案: