向Keras模型添加CTC损失和CTC解码

时间:2020-08-05 04:24:39

标签: keras conv-neural-network lstm handwriting-recognition ctc

我正在尝试解决手写文本识别的用例。我已经使用CNN和LSTM创建了一个网络。其输出需要馈送到CTC层。我可以在本地tensorflow中找到一些代码来做到这一点。在Keras中,有没有更简单的选择。

model = Sequential()
model.add(Conv2D(64, kernel_size=(5,5),activation = 'relu', input_shape=(128,32,1), padding='same', data_format='channels_last'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(128, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(256, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(Conv2D(256, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,2),padding='same'))
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,2),padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,1)))
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(Lambda(lambda x: x[:, :, 0, :], output_shape=(None,31,512), mask=None, arguments=None))
#model.add(Bidirectional(LSTM(256, return_sequences=True), input_shape=(31, 256)))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(Dense(75, activation = 'softmax'))

Network Architecture before CTC loss and decode

任何有关如何轻松添加CTC损失和解码层的帮助都将很棒

1 个答案:

答案 0 :(得分:0)

CTC损失函数需要四个参数来计算损失,预测输出,基本事实标签,LSTM的输入序列长度和基本事实标签长度。为此,我们需要创建一个自定义损失函数,然后将其传递给模型。为了使其与您定义的模型兼容,我们需要创建一个模型,该模型接受这四个输入并输出损失。此模型将用于培训和测试,可以使用您先前创建的模型。

让我们创建一个以不同方式使用的keras模型,以便我们可以创建两个不同版本的模型,以便在培训和测试时使用。

# input with shape of height=32 and width=128 
inputs = Input(shape=(32, 128, 1))

# convolution layer with kernel size (3,3)
conv_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
# poolig layer with kernel size (2,2)
pool_1 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_1)

conv_2 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool_1)
pool_2 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_2)

conv_3 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool_2)

conv_4 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv_3)
# poolig layer with kernel size (2,1)
pool_4 = MaxPool2D(pool_size=(2, 1))(conv_4)

conv_5 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool_4)
# Batch normalization layer
batch_norm_5 = BatchNormalization()(conv_5)

conv_6 = Conv2D(512, (3, 3), activation='relu', padding='same')(batch_norm_5)
batch_norm_6 = BatchNormalization()(conv_6)
pool_6 = MaxPool2D(pool_size=(2, 1))(batch_norm_6)

conv_7 = Conv2D(512, (2, 2), activation='relu')(pool_6)

squeezed = Lambda(lambda x: K.squeeze(x, 1))(conv_7)

# bidirectional LSTM layers with units=128
blstm_1 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(squeezed)
blstm_2 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(blstm_1)

outputs = Dense(len(char_list) + 1, activation='softmax')(blstm_2)

# model to be used at test time
test_model = Model(inputs, outputs)

我们将在培训期间使用ctc_loss_fuction。因此,让我们实现ctc_loss_function并使用ctc_loss_function创建训练模型:

labels = Input(name='the_labels', shape=[max_label_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')


def ctc_lambda_func(args):
    y_pred, labels, input_length, label_length = args

    return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
  

loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([outputs, labels, 
input_length, label_length])

#model to be used at training time
training_model = Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)

->训练该模型并将权重保存在.h5文件中

现在使用测试模型并通过使用参数by_name = True加载训练模型的权重,这样它将仅为匹配的图层加载权重。