我正在尝试创建一个说话人识别暹罗神经网络,该网络将两个样本作为输入并弄清楚它们是否来自同一说话人。为此,我正在使用我已经检查过的一些资料(here和here)中描述的对比损失函数。
我有一个玩具数据集,我在上面训练了一个小模型(9500个训练样本和500个测试样本)。训练集的精度提高到0.97,而验证精度提高到0.93。到目前为止,一切都很好。 但是,当我尝试在更大的数据集上应用相同的配置时,我得到的结果很差。训练精度提高了,但验证损失从未超过0.5,这与对此类问题的随机猜测一样好。这是我的代码:
import numpy as np
import keras
import tensorflow as tf
from keras.models import Sequential, Model
from keras.layers import Dense, Activation, Flatten, Input, Concatenate, Lambda, merge
from keras.layers import Dropout
from keras.layers import LSTM, BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras import backend as K
K.set_image_dim_ordering('tf')
def Siamese_Contrastive_Loss():
filepath = 'C:/Users/User/Documents/snet.h5'
X_1, X_2, x1_val, x2_val, Y, val_y = data_preprocessing_load()
input_shape = (sample_length, features, 1)
left_input = Input(input_shape)
right_input = Input(input_shape)
baseNetwork = createBaseNetworkSmaller(sample_length, features, 1)
encoded_l = baseNetwork(left_input)
encoded_r = baseNetwork(right_input)
distance = Lambda(euclidean_distance,output_shape=eucl_dist_output_shape)([encoded_l, encoded_r])
model = Model([left_input, right_input], distance)
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
model.compile(loss=contrastive_loss, optimizer='rmsprop', metrics=[acc])
model.fit([X_1,X_2], Y, validation_data=([x1_val, x2_val],val_y), epochs=20, batch_size=32, verbose=2, callbacks=callbacks_list)
def data_preprocessing_load():
...
return X_1, X_2, x1_val, x2_val, Y, val_y
def createBaseNetworkSmaller(sample_length, features, ii):
input_shape = (sample_length, features, ii)
baseNetwork = Sequential()
baseNetwork.add(Conv2D(64,(10,10),activation='relu',input_shape=input_shape))
baseNetwork.add(MaxPooling2D(pool_size=3))
baseNetwork.add(Conv2D(64,(5,5),activation='relu'))
baseNetwork.add(MaxPooling2D(pool_size=1))
#baseNetwork.add(BatchNormalization())
baseNetwork.add(Flatten())
baseNetwork.add(Dense(32, activation='relu'))
#baseNetwork.add(Dropout(0.2))
baseNetwork.add(Dense(32, activation='relu'))
return baseNetwork
def euclidean_distance(vects):
x, y = vects
return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))
def eucl_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0], 1)
def contrastive_loss(y_true, y_pred):
'''Contrastive loss from Hadsell-et-al.'06
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
'''
margin = 1
square_pred = K.square(y_pred)
margin_square = K.square(K.maximum(margin - y_pred, 0))
#return K.mean(y_true * square_pred + (1 - y_true) * margin_square)
return K.mean((1 - y_true) * K.square(y_pred) + y_true * K.square(K.maximum(margin - y_pred, 0)))
def acc(y_true, y_pred):
ones = K.ones_like(y_pred)
return K.mean(K.equal(y_true, ones - K.clip(K.round(y_pred), 0, 1)), axis=-1)
我认为问题出在以下事实:我不确切知道对比损失应该在做什么。我将一个正对子集(来自同一说话者的样本)标记为0,将另一个负对子集(来自不同说话者的样本)标记为1。据我所知,其想法是尝试最大程度地消除负向点之间的距离。配对并将其最小化。我不确定这里是否是这种情况。名为“ acc”的功能可确定训练每一步的准确性。名为“ contrastive_loss”的函数是主要的损失函数,在这里我放置了两个return语句,其中一个已被注释掉。我在一个论坛上读到,根据一个人如何标记他们的正负配对(分别为0/1或1/0),他们应该使用相应的公式。在这一点上,我感到困惑。我应该使用什么配置?正对应该为0,负对是否为1,反之亦然?最后,对比损失应该是什么样的?
答案 0 :(得分:1)
您如何获得成对的音频样本和标签(对于不同的人为0,对于相同的人为1)。我建议您在具有2个神经元的最后一层使用S形函数。这样,您将使用“ binary_crossentropy”损失函数。网络的输出将是0到1之间的值,其中0将是2个音频样本之间的最大差异,而1将是最大相似度。
`def createBaseNetworkSmaller(sample_length, features, ii):
input_shape = (sample_length, features, ii)
baseNetwork = Sequential()
baseNetwork.add(Conv2D(64,(10,10),activation='relu',input_shape=input_shape))
baseNetwork.add(MaxPooling2D(pool_size=3))
baseNetwork.add(Conv2D(64,(5,5),activation='relu'))
baseNetwork.add(MaxPooling2D(pool_size=1))
#baseNetwork.add(BatchNormalization())
baseNetwork.add(Flatten())
baseNetwork.add(Dense(32, activation='relu'))
#baseNetwork.add(Dropout(0.2))
baseNetwork.add(Dense(2, activation='sigmoid'))
return baseNetwork`
`model.compile(loss=contrastive_loss, optimizer='rmsprop', metrics=[acc])`