将三重态损失应用于识别问题

时间:2019-09-06 11:48:19

标签: python keras neural-network deep-learning conv-neural-network

我想对某些识别问题应用三元组损失。我这样做是通过制作三元组并将它们存储在5D数组中来实现的。 X = [total_samples,triplet_marking,channels,height,width] 。尽管这里不需要地面真相标签,因为Triplet_markings(0:锚点,1:正和2:负)已经足够了,但适合我需要的keras。地面真相标签在2D数组 Y = [total_samples,Triplet_marking] 中定义。

首先,我最后使用函数定义了具有16个元素FC层的基本神经网络。然后,我将三重锚定pos-neg作为输入。然后定义了三重态损失,如吴安德鲁(Andrew Ng)在Coursera的课程中所定义。然后定义,编译和训练模型。

def network(input_shape):
    seq = models.Sequential()
    seq.add(layers.Conv2D(8, (3,3), (1,1), data_format="channels_first", activation="relu", kernel_initializer="glorot_uniform"))
    seq.add(layers.Conv2D(8, (3,3), (1,1), data_format="channels_first", activation="relu", kernel_initializer="glorot_uniform"))
    seq.add(layers.MaxPooling2D(pool_size=(2, 2), data_format="channels_first"))
    seq.add(Flatten())
    seq.add(Dense(16, activation='relu'))
    seq.add(BatchNormalization())
    return seq

img_anc = layers.Input(shape=(3,384,384))
img_pos = layers.Input(shape=(3,384,384))
img_neg = layers.Input(shape=(3,384,384))

net = network((3,384,384))
feature_anc = base_network(img_anc)
feature_pos = base_network(img_pos)
feature_neg = base_network(img_neg)

def triplet_loss(y_true, y_pred):
    a = y_pred[:,0]
    p = y_pred[:,1]
    n = y_pred[:,2]

    margin = 1

    pos_dist = np.sum(np.square(np.subtract(a,p)))
    neg_dist = np.sum(np.square(np.subtract(a,n)))
    basic_loss = np.subtract(pos_dist,neg_dist) + margin
    loss = np.max(basic_loss,0)

    return loss  


model_train = models.Model(input=[img_anc, img_pos, img_neg], output=[feature_anc, feature_pos, feature_neg])

model_train.compile(loss=triplet_loss, optimizer='adam')

img_a = x_train[:, 0] #using triplet_marking
img_p = x_train[:, 1]
img_n = x_train[:, 2]

# [img_a, img_p, img_c] vector is my input
# [y_train[:,0], y_train[:,1], y_train[:,2]] vector is my ground truth labels
# loss-func is not taking labels in account, I'm giving it just for the keras to work 

history = model_train.fit([img_a, img_p, img_n], [y_train[:,0], y_train[:,1], y_train[:,2]], batch_size=16, epochs= 100, verbose=2, validation_split=.25, shuffle=True)

当我提供3张图像作为输入时,我希望y_pred的形状为(batch_size,3,16),其中y_pred [:,0]将是锚点特征,y_pred [: ,1]将为pos功能,而y_pred [:,2]将为负功能,但其形状为(batch_size,16)。也许这就是为什么我没有得到正确的结果。 如果我在任何地方做错了,请纠正我。

0 个答案:

没有答案