我正在尝试构建一个暹罗神经网络来接受两个面部表情,并输出两个图像相似的可能性。我有5个人,每人有10个表情,所以总共有50张图片,但是使用暹罗语,我可以生成2500对(重复)。我已经在50张图像中的每张上运行了dlib的面部界标检测,因此,暹罗网络的两个输入中的每个都是两个扁平的136,1元素数组。暹罗结构如下:
input_shape = (136,)
left_input = Input(input_shape, name = 'left')
right_input = Input(input_shape, name = 'right')
convnet = Sequential()
convnet.add(Dense(50,activation="relu"))
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)
L1_layer = Lambda(lambda tensors:K.abs(tensors[0] - tensors[1]))
L1_distance = L1_layer([encoded_l, encoded_r])
prediction = Dense(1,activation='relu')(L1_distance)
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)
optimizer = Adam()
siamese_net.compile(loss="binary_crossentropy",optimizer=optimizer)
我有一个名为x_train的数组,该数组占所有可能标签的80%,其元素是列表列表。数据是一个5x10x64x2的矩阵,其中有5个ppl,每个表达式10个表达式,64个面部界标和每个界标2个正电子(x,y)
x_train = [ [ [Person, Expression] , [Person, Expression] ], ...]
data = np.load('data.npy')
我的火车环行如下:
def train(data, labels, network, epochs, batch_size):
track_loss = defaultdict(list)
for i in range(0, epochs):
iterations = len(labels)//batch_size
remain = len(labels)%batch_size
shuffle(labels)
print('Epoch%s-----' %(i + 1))
for j in range(0, iterations):
batch = [j*batch_size, j*batch_size + batch_size]
if(j == iterations - 1):
batch[1] += remain
mini_batch = np.zeros(shape = (batch[1] - batch[0], 2, 136))
for k in range(batch[0], batch[1]):
prepx = data[labels[k][0][0],labels[k][0][1],:,:]
prepy = data[labels[k][1][0],labels[k][1][1],:,:]
mini_batch[k - batch[0]][0] = prepx.flatten()
mini_batch[k - batch[0]][1] = prepy.flatten()
targets = np.array([1 if(labels[i][0][1] == labels[i][1][1]) else 0 for i in range(batch[0], batch[1])])
new_batch = mini_batch.reshape(batch[1] - batch[0], 2, 136, 1)
new_targets = targets.reshape(batch[1] - batch[0], 1)
#print(mini_batch.shape, targets.shape)
loss=siamese_net.train_on_batch(
{
'left': mini_batch[:, 0, :],
'right': mini_batch[:, 1, :]
},targets)
track_loss['Epoch%s'%(i)].append(loss)
return network, track_loss
siamese_net, track_loss = train(data, x_train,siamese_net, 20, 30)
目标数组中每个元素的值为0或1,具体取决于输入到网络中的两个表达式是不同还是相同。
尽管,我在omniglot示例中看到的图像和测试图像更多,但我的神经网络的损失却没有减少。
编辑:
这是目标张量固定的新损失:
Epoch1-----Loss: 1.979214
Epoch2-----Loss: 1.631347
Epoch3-----Loss: 1.628090
Epoch4-----Loss: 1.634603
Epoch5-----Loss: 1.621578
Epoch6-----Loss: 1.631347
Epoch7-----Loss: 1.631347
Epoch8-----Loss: 1.631347
Epoch9-----Loss: 1.621578
Epoch10-----Loss: 1.634603
Epoch11-----Loss: 1.634603
Epoch12-----Loss: 1.621578
Epoch13-----Loss: 1.628090
Epoch14-----Loss: 1.624834
Epoch15-----Loss: 1.631347
Epoch16-----Loss: 1.634603
Epoch17-----Loss: 1.628090
Epoch18-----Loss: 1.631347
Epoch19-----Loss: 1.624834
Epoch20-----Loss: 1.624834
我想知道如何改善我的体系结构和训练过程,甚至是数据准备,以提高神经网络的性能。我以为使用dlib的人脸标志检测可以简化神经网络的复杂性,但是我开始怀疑这一假设。