TF2 Keras,训练有素的检查点丢失与初始损失相同

时间:2019-12-18 19:37:39

标签: tensorflow keras speech-recognition transformer

Python 3.6.8
tf.version:“ 2.1.0-rc1”
tf.keras.version:“ 2.2.4-tf”

我正在尝试修改用于语音识别任务的官方转换器模型。
网址:https://github.com/tensorflow/models/blob/master/official/transformer/v2/transformer_main.py

如语音变压器论文(http://150.162.46.34:8080/icassp2018/ICASSP18_USB/pdfs/0005884.pdf)中所述, 我在编码器堆栈之前添加了2个Conv2D层,并修改了用于处理2D输入的输入数据管道。

当我训练模型时,损失从8.XX开始,然后很好地收敛到几乎0.38倍。 但是,当我加载模型并开始训练时,损失又从8.XX开始。

您可以在编码器堆栈之前检查其他层。

class Embed(tf.keras.layers.Layer):
  def __init__(self, params, cmvn_mean, cmvn_std):
    super(Embed, self).__init__()
    self.params = params
    self.params = params

    self.linear_projection = None
    self.cmvn_mean = None
    self.cmvn_std = None
    self.mean = cmvn_mean
    self.std = cmvn_std
    self.conv1 = None
    self.conv2 = None
    self.bn1 = None
    self.bn2 = None
    self.reshape_to_conv = None
    self.flatten = None

  def build(self, input_shape):
    params = self.params
    # I added some layers for embedding
    self.linear_projection = \
    tf.keras.layers.TimeDistributed(
      tf.keras.layers.Dense(
        params["hidden_size"], activation='linear'))
    self.cmvn_mean = tf.keras.backend.constant(self.mean)
    self.cmvn_std = tf.keras.backend.constant(self.std)
    self.cmvn_layer = tf.keras.layers.Lambda(_normalize)

    # TODO: stride and filter number are hard coded.
    stride=2
    filter_number=64
    self.conv1 = tf.keras.layers.Conv2D(filters=filter_number, kernel_size=(3,3),
                                       activation='relu', padding='same',
                                       strides=(stride, stride))
    self.conv2 = tf.keras.layers.Conv2D(filters=filter_number, kernel_size=(3,3),
                                       activation='relu', padding='same',
                                       strides=(stride, stride))
    self.bn1 = tf.keras.layers.BatchNormalization()
    self.bn2 = tf.keras.layers.BatchNormalization()
    self.reshape_to_conv = tf.keras.layers.Reshape((-1,
                                                    params["feature_dimension"],
                                                    1))
    self.flatten = \
      tf.keras.layers.Reshape((-1, int(params["feature_dimension"] / (stride * 2))
                               * filter_number))

    super(Embed, self).build(input_shape)

  def get_config(self):
    return {
        "params": self.params,
    }

  def call(self, inputs):
    """
    from the Speech-Transformer paper,
      We firstly stack two 3×3 CNN layers with stride 2 for both time and
      frequency dimensions to prevent the GPU memory overflow and produce the
      approximate hidden representation length with the character length.
    """

    norm_input = self.cmvn_layer([inputs, self.cmvn_mean, self.cmvn_std])
    conv_input = self.reshape_to_conv(norm_input)
    conv_input = self.conv1(conv_input)
    conv_input = self.bn1(conv_input)
    conv_input = self.conv2(conv_input)
    conv_input = self.bn2(conv_input)

    # TODO: Additional Module (opt) in section 4.2 ==> I'll build big model
    """
    Then, we can optionally stack M additional modules which are applied to
    extracting more expressive representations for our Speech Transformer,
    which will be detailed in section 4.2. ==> skipped
    """

    # Linear
    """
    Next, we perform a linear transformation on the flattened feature map
    outputs to obtain the vectors of dimension d_model, which is called input
    encoding here.
    """

    flattned_input = self.flatten(conv_input)
    return self.linear_projection(flattned_input)

0 个答案:

没有答案