使用Lipton et al (2016)中的示例,目标复制基本上是在LSTM(或GRU)的每个时间步长(最终除外)上计算损失,并对损失进行平均,然后将其添加到训练过程中的主要损失中。从数学上讲,它由-
给出以图形方式表示为-
那么我该如何在Keras中完全实现呢?说,我有二进制分类任务。假设我的模型是下面给出的简单模型-
OpCodes.Box
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', class_weights={0:0.5, 1:4}, optimizer=Adam(), metrics=['accuracy'])
model.fit(x_train, y_train)
需要从(batch_size,1)调整为(batch_size,time_step)对吗?y_train
后,密集层需要TimeDistributed
才能正确应用于LSTM吗?return_sequences=True
是否需要修改?class_weights
,其长度为15,平均长度为7。由于目标复制损失在所有步骤中平均,因此如何确保在计算损失时不使用填充词?基本上,动态分配 T 实际序列长度。答案 0 :(得分:3)
问题1:
因此,对于目标,您需要将其塑造为(batch_size, time_steps, 1)
。只需使用:
y_train = np.stack([y_train]*time_steps, axis=1)
问题2:
您是正确的,但是TimeDistributed
在Keras 2中是可选的。
问题3:
我不知道类权重的表现方式,但是常规的损失函数应该像这样:
from keras.losses import binary_crossentropy
def target_replication_loss(alpha):
def inner_loss(true,pred):
losses = binary_crossentropy(true,pred)
return (alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1])
return inner_loss
model.compile(......, loss = target_replication_loss(alpha), ...)
问题3a:
由于上述方法不适用于类权重,因此我创建了一种替代方法,将权重计入损失:
def target_replication_loss(alpha, class_weights):
def get_weights(x):
b = class_weights[0]
a = class_weights[1] - b
return (a*x) + b
def inner_loss(true,pred):
#this will only work for classification with only one class 0 or 1
#and only if the target is the same for all classes
true_classes = true[:,-1,0]
weights = get_weights(true_classes)
losses = binary_crossentropy(true,pred)
return weights*((alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1]))
return inner_loss
问题4:
为避免复杂性,我想您应该在验证中使用其他metric
:
def last_step_BC(true,pred):
return binary_crossentropy(true[:,-1], pred[:,-1])
model.compile(....,
loss = target_replication_loss(alpha),
metrics=[last_step_BC])
问题5:
这是一个很难的过程,我需要研究一下。...
作为最初的解决方法,您可以使用输入形状(None, features)
设置模型,并分别训练每个序列。
def target_replication_loss(alpha):
def inner_loss(true,pred):
losses = binary_crossentropy(true,pred)
#print(K.int_shape(losses))
#print(K.int_shape(losses[:,:-1]))
#print(K.int_shape(K.mean(losses[:,:-1], axis=-1)))
#print(K.int_shape(losses[:,-1]))
return (alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1])
return inner_loss
alpha = 0.6
i1 = Input((5,2))
i2 = Input((5,2))
out = LSTM(1, activation='sigmoid', return_sequences=True)(i1)
model = Model(i1, out)
model.compile(optimizer='adam', loss = target_replication_loss(alpha))
model.fit(np.arange(30).reshape((3,5,2)), np.arange(15).reshape((3,5,1)), epochs = 200)
def target_replication_loss(alpha, class_weights):
def get_weights(x):
b = class_weights[0]
a = class_weights[1] - b
return (a*x) + b
def inner_loss(true,pred):
#this will only work for classification with only one class 0 or 1
#and only if the target is the same for all classes
true_classes = true[:,-1,0]
weights = get_weights(true_classes)
losses = binary_crossentropy(true,pred)
print(K.int_shape(losses))
print(K.int_shape(losses[:,:-1]))
print(K.int_shape(K.mean(losses[:,:-1], axis=-1)))
print(K.int_shape(losses[:,-1]))
print(K.int_shape(weights))
return weights*((alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1]))
return inner_loss
alpha = 0.6
class_weights={0: 0.5, 1:4.}
i1 = Input(batch_shape=(3,5,2))
i2 = Input((5,2))
out = LSTM(1, activation='sigmoid', return_sequences=True)(i1)
model = Model(i1, out)
model.compile(optimizer='adam', loss = target_replication_loss(alpha, class_weights))
model.fit(np.arange(30).reshape((3,5,2)), np.arange(15).reshape((3,5,1)), epochs = 200)