我正在尝试在Keras中实现自定义丢失但无法使其正常工作。
我已经用numpy和keras.backend实现了它:
def log_rmse_np(y_true, y_pred):
d_i = np.log(y_pred) - np.log(y_true)
loss1 = (np.sum(np.square(d_i))/np.size(d_i))
loss2 = ((np.square(np.sum(d_i)))/(2 * np.square(np.size(d_i))))
loss = loss1 - loss2
print('np_loss = %s - %s = %s'%(loss1, loss2, loss))
return loss
def log_rmse(y_true, y_pred):
d_i = (K.log(y_pred) - K.log(y_true))
loss1 = K.mean(K.square(d_i))
loss2 = K.square(K.sum(K.flatten(d_i),axis=-1))/(K.cast_to_floatx(2) * K.square(K.cast_to_floatx(K.int_shape(K.flatten(d_i))[0])))
loss = loss1 - loss2
return loss
当我测试并将损失与以下功能进行比较时,一切似乎都正常。
def check_loss(_shape):
if _shape == '2d':
shape = (6, 7)
elif _shape == '3d':
shape = (5, 6, 7)
elif _shape == '4d':
shape = (8, 5, 6, 7)
elif _shape == '5d':
shape = (9, 8, 5, 6, 7)
y_a = np.random.random(shape)
y_b = np.random.random(shape)
out1 = K.eval(log_rmse(K.variable(y_a), K.variable(y_b)))
out2 = log_rmse_np(y_a, y_b)
print('shapes:', str(out1.shape), str(out2.shape))
print('types: ', type(out1), type(out2))
print('log_rmse: ', np.linalg.norm(out1))
print('log_rmse_np: ', np.linalg.norm(out2))
print('difference: ', np.linalg.norm(out1-out2))
assert out1.shape == out2.shape
#assert out1.shape == shape[-1]
def test_loss():
shape_list = ['2d', '3d', '4d', '5d']
for _shape in shape_list:
check_loss(_shape)
print ('======================')
test_loss()
以上代码打印:
np_loss = 1.34490449177 - 0.000229461787517 = 1.34467502998
shapes: () ()
types: <class 'numpy.float32'> <class 'numpy.float64'>
log_rmse: 1.34468
log_rmse_np: 1.34467502998
difference: 3.41081509703e-08
======================
np_loss = 1.68258448859 - 7.67580654591e-05 = 1.68250773052
shapes: () ()
types: <class 'numpy.float32'> <class 'numpy.float64'>
log_rmse: 1.68251
log_rmse_np: 1.68250773052
difference: 1.42057615005e-07
======================
np_loss = 1.99736933814 - 0.00386228512295 = 1.99350705302
shapes: () ()
types: <class 'numpy.float32'> <class 'numpy.float64'>
log_rmse: 1.99351
log_rmse_np: 1.99350705302
difference: 2.53924863358e-08
======================
np_loss = 1.95178217182 - 1.60006871892e-05 = 1.95176617114
shapes: () ()
types: <class 'numpy.float32'> <class 'numpy.float64'>
log_rmse: 1.95177
log_rmse_np: 1.95176617114
difference: 3.78277884572e-08
======================
当我编译并使我的模型适应这种损失时,我从未得到异常,当我使用'adam'损失运行模型时,一切正常。 然而,随着这种损失,keras不断出现亏损:
Epoch 1/10000
17/256 [>.............................] - ETA: 124s - loss: nan
有点卡在这里......我做错了吗?
在Ubuntu 16.04上使用Tensorflow 1.4
更新
在MarcinMożejko的建议之后我更新了代码,但不幸的是,训练损失仍然是Nan:
def get_log_rmse(normalization_constant):
def log_rmse(y_true, y_pred):
d_i = (K.log(y_pred) - K.log(y_true))
loss1 = K.mean(K.square(d_i))
loss2 = K.square(K.sum(K.flatten(d_i),axis=-1))/K.cast_to_floatx(2 * normalization_constant ** 2)
loss = loss1 - loss2
return loss
return log_rmse
然后通过以下方式编译模型:
model.compile(optimizer='adam', loss=get_log_rmse(batch_size))
更新2:
模型摘要如下所示:
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 160, 256, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 160, 256, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 160, 256, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 80, 128, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 80, 128, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 80, 128, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 40, 64, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 40, 64, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 40, 64, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 40, 64, 256) 590080
_________________________________________________________________
block3_conv4 (Conv2D) (None, 40, 64, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 20, 32, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 20, 32, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 20, 32, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 20, 32, 512) 2359808
_________________________________________________________________
block4_conv4 (Conv2D) (None, 20, 32, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 10, 16, 512) 0
_________________________________________________________________
conv2d_transpose_5 (Conv2DTr (None, 10, 16, 128) 1048704
_________________________________________________________________
up_sampling2d_5 (UpSampling2 (None, 20, 32, 128) 0
_________________________________________________________________
conv2d_transpose_6 (Conv2DTr (None, 20, 32, 64) 131136
_________________________________________________________________
up_sampling2d_6 (UpSampling2 (None, 40, 64, 64) 0
_________________________________________________________________
conv2d_transpose_7 (Conv2DTr (None, 40, 64, 32) 32800
_________________________________________________________________
up_sampling2d_7 (UpSampling2 (None, 80, 128, 32) 0
_________________________________________________________________
conv2d_transpose_8 (Conv2DTr (None, 80, 128, 16) 8208
_________________________________________________________________
up_sampling2d_8 (UpSampling2 (None, 160, 256, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 160, 256, 1) 401
=================================================================
Total params: 11,806,401
Trainable params: 11,806,401
Non-trainable params: 0
更新3:
示例y_true:
答案 0 :(得分:2)
问题出在这一部分:
K.cast_to_floatx(K.int_shape(K.flatten(d_i))[0]
因为在提供任何形状之前编译了一个损失函数 - 该表达式的计算结果为None
,这是您的错误所在。我尝试设置batch_input_shape
而不是input_shape
,但这也不起作用(可能是由于keras
编译模型的方式)。我建议以下列方式将此数字设置为常量:
def get_log_rmse(normalization_constant):
def log_rmse(y_true, y_pred):
d_i = (K.log(y_pred) - K.log(y_true))
loss1 = K.mean(K.square(d_i))
loss2 = K.square(
K.sum(
K.flatten(d_i),axis=-1))/(K.cast_to_floatx(
2 * normalization_constant ** 2)
loss = loss1 - loss2
return loss
return log_rmse
然后编译:
model.compile(..., loss=get_log_rmse(normalization_constant))
我猜normalization_constant
等于batch_size
,但我不确定,所以我已将其设为通用。
<强>更新强>
在MarcinMożejko的建议之后我更新了代码,但不幸的是,训练损失仍然是Nan:
def get_log_rmse(normalization_constant):
def log_rmse(y_true, y_pred):
d_i = (K.log(y_pred) - K.log(y_true))
loss1 = K.mean(K.square(d_i))
loss2 = K.square(K.sum(K.flatten(d_i),axis=-1))/K.cast_to_floatx(2 * normalization_constant ** 2)
loss = loss1 - loss2
return loss
return log_rmse
然后通过以下方式编译模型:
model.compile(optimizer='adam', loss=get_log_rmse(batch_size))
更新2:
模型定义如下所示:
input_shape = (160, 256, 3)
print('Input_shape: %s'%str(input_shape))
base_model = keras.applications.vgg19.VGG19(include_top=False, weights='imagenet',
input_tensor=None, input_shape=input_shape,
pooling=None, # None, 'avg', 'max'
classes=1000)
for i in range(5):
base_model.layers.pop()
base_model = Model(inputs=base_model.input, outputs=base_model.get_layer('block4_pool').output)
print('VGG19 output_shape: ' + str(base_model.output_shape))
x = Deconv(128, kernel_size=(4, 4), strides=1, padding='same', activation='relu')(base_model.output)
x = UpSampling2D((2, 2))(x)
x = Deconv(64, kernel_size=(4, 4), strides=1, padding='same', activation='relu')(x)
x = UpSampling2D((2, 2))(x)
x = Deconv(32, kernel_size=(4, 4), strides=1, padding='same', activation='relu')(x)
x = UpSampling2D((2, 2))(x)
x = Deconv(16, kernel_size=(4, 4), strides=1, padding='same', activation='relu')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(1, kernel_size=(5, 5), strides=1, padding='same')(x)
model = Model(inputs=base_model.input, outputs=x)
答案 1 :(得分:0)
尝试在内置损耗上拟合模型几个时间。然后使用自己的损失重新编译模型。这可能会有帮助。