我正在为语义分割问题构建自定义的u-net,但是我看到一种奇怪的行为,即在训练期间计算loss
和metric
的方式,差异非常大
我已经读过this one (1),and this one (2),another one (3)和yet another one(4),但是找不到合适的答案。
训练模型时,我对loss
和metric
使用了相同的函数,结果差异很大。
第一个使用categorical_cross_entropy
的示例(我正在使用一个很小的玩具来展示它):
from tensorflow.python.keras import losses
model.compile(optimizer='adam', loss=losses.categorical_crossentropy,
metrics=[losses.categorical_crossentropy])
我得到的输出是:
4/4 [===] - 3s 677ms/step - loss: 4.1023 - categorical_crossentropy: 1.0256
- val_loss: 1.3864 - val_categorical_crossentropy: 1.3864
如您所见,损耗和 categorical_crossentropy 约为4倍。
如果我使用的是自定义指标,则差异为数量级:
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.losses import categorical_crossentropy
def dice_cross_loss(y_true, y_pred, epsilon=1e-6, smooth=1):
ce_loss = categorical_crossentropy(y_true, y_pred)
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
dice_coef = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + epsilon)
return ce_loss - K.log(dice_coef + epsilon)
model.compile(optimizer='adam', loss=dice_cross_loss,
metrics=[dice_cross_loss])
当我运行它时,情况更糟:
4/4 [===] - 3s 682ms/step - loss: 20.9706 - dice_cross_loss: 5.2428
- val_loss: 4.3681 - val_dice_cross_loss: 4.3681
在使用较大的示例时,loss
和损失metric
之间的差异可能超过十倍。
在阅读(1)时,我删除了所有在评估上可能有所不同的正则化层。从模型。没有dropout
,没有batchnorm
。有pooling
,但这不应该是原因。
合适的代码不明显:
model.fit(x=data_x, y=data_y, batch_size=batch_size, epochs=epochs,
verbose=1, validation_split=0.2, shuffle=True, workers=4)
这是网络的代码:
class CustomUnet(object):
def __init__(self, image_shape=(20, 30, 3), n_class=2, **params):
# read parameters
initial_filters = params.get("initial_filters", 64)
conv_activations = params.get("conv_activations", ReLU())
final_activation = params.get("final_activation", "softmax")
self.name = "CustomUnet"
input_layer = Input(shape=image_shape, name='image_input')
conv1 = self.conv_block(input_layer, nfilters=initial_filters, activation=conv_activations, name="con1")
conv1_out = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = self.conv_block(conv1_out, nfilters=initial_filters*2, activation=conv_activations, name="con2")
conv2_out = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = self.conv_block(conv2_out, nfilters=initial_filters*4, activation=conv_activations, name="con3")
conv3_out = MaxPooling2D(pool_size=(2, 2))(conv3)
conv4 = self.conv_block(conv3_out, nfilters=initial_filters*8, activation=conv_activations, name="con4")
# number jumps from 4 to 7 because it used to have an extra layer and haven't got to refactor properly.
deconv7 = self.deconv_block(conv4, residual=conv3, nfilters=initial_filters*4, name="decon7",
conv_activations=conv_activations)
deconv8 = self.deconv_block(deconv7, residual=conv2, nfilters=initial_filters*2, name="decon8",
conv_activations=conv_activations)
deconv9 = self.deconv_block(deconv8, residual=conv1, nfilters=initial_filters, name="decon9",
conv_activations=conv_activations)
output_layer = Conv2D(filters=n_class, kernel_size=(1, 1))(deconv9)
model = Model(inputs=input_layer, outputs=output_layer4, name='Unet')
self.model = model
def conv_block(self, input_layer, nfilters, size=3, padding='same', initializer="he_normal", name="none",
activation=ReLU()):
x = Conv2D(filters=nfilters, kernel_size=(size, size), padding=padding, kernel_initializer=initializer)(input_layer)
x = Activation(activation)(x)
x = Conv2D(filters=nfilters, kernel_size=(size, size), padding=padding, kernel_initializer=initializer)(x)
x = Activation(activation)(x)
return x
def deconv_block(self, input_layer, residual, nfilters, size=3, padding='same', strides=(2, 2), name="none",
conv_activations=ReLU()):
y = Conv2DTranspose(nfilters, kernel_size=(size, size), strides=strides, padding=padding)(input_layer)
y = concatenate([y, residual]) #, axis=3)
y = self.conv_block(y, nfilters, activation=conv_activations)
return y
这是预期的行为吗?对于loss
和metric
的计算方式的区别,我有什么不明白的地方?我是否弄乱了代码中的某些内容?
谢谢!
from tensorflow.python.keras.layers import Input, Conv2D, Activation
from tensorflow.python.keras.models import Model
import numpy as np
input_data = np.random.rand(100, 300, 300, 3) # 300x300 images
out_data = np.random.randint(0, 2, size=(100, 300, 300, 4)) # 4 classes
def simple_model(image_shape, n_class):
input_layer = Input(shape=image_shape, name='image_input')
x = Conv2D(filters=3, kernel_size=(3, 3), padding="same", kernel_initializer="he_normal")(input_layer)
x = Activation("relu")(x)
x = Conv2D(filters=3, kernel_size=(3, 3), padding="same", kernel_initializer="he_normal")(x)
x = Activation("relu")(x)
x = Conv2D(filters=n_class, kernel_size=(1, 1))(x)
output_layer = Activation("softmax")(x)
model = Model(inputs=input_layer, outputs=output_layer, name='Sample')
return model
sample_model = simple_model(input_data[0].shape, out_data.shape[-1])
sample_model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["categorical_crossentropy"])
batch_size = 5
steps = input_data.shape[0] // batch_size
epochs = 20
history = sample_model.fit(x=input_data, y=out_data, batch_size=batch_size, epochs=epochs, # , callbacks=callbacks,
verbose=1, validation_split=0.2, workers=1)
我得到的结果仍然很奇怪:
80/80 [===] - 9s 108ms/step - loss: 14.0259 - categorical_crossentropy: 2.8051 - val_loss: 13.9439 - val_categorical_crossentropy: 2.7885
所以loss: 14.0259 - categorical_crossentropy: 2.8051
。现在我迷路了...
答案 0 :(得分:1)
这似乎是TF导入库的问题。
如果我这样做
for rowIndex, predictedPointRow in enumerate(predict):
for colIndex, predPoint in enumerate(predictedPointRow):
if predPoint is not 6:
img[rowIndex][colIndex] = [0, 0, 0]
我从上面得到了奇怪的行为
如果我将其替换为
from tensorflow.python.keras.layers import Input, Conv2D, Activation
from tensorflow.python.keras.models import Model
我得到更一致的数字:
from keras.layers import Input, Conv2D, Activation
from keras.models import Model
仍然存在一些差异,但它们似乎更合理 不过,如果您知道为什么,请告诉我!
答案 1 :(得分:0)
Keras确实从NumPy随机数生成器中获取了随机性的来源,因此无论您使用的是Theano还是TensorFlow后端,都必须植入此种子。
在任何其他导入或其他代码之前,我们在文件顶部使用seed()函数。
从numpy.random导入种子 种子(1)
此外,TensorFlow拥有自己的随机数生成器,还必须通过在NumPy随机数生成器之后立即调用set_random_seed()函数来植入种子,如下所示:
从tensorflow导入set_random_seed set_random_seed(2)谢谢, 拉杰斯瓦里·蓬努鲁(Rajeswari Ponnuru)。