在Keras中使用自定义损失函数时的批次大小问题

时间:2018-11-10 00:47:11

标签: python tensorflow keras

我正在通过定义自定义损失函数对标准神经网络进行一些修改。自定义损失函数不仅取决于y_true和y_pred,还取决于训练数据。我使用here中所述的包装解决方案来实现它。

具体来说,我想定义一个自定义损失函数,该函数是标准ms加上输入和y_pred的平方之间的mse:

def custom_loss(x_true)
    def loss(y_true, y_pred):
        return K.mean(K.square(y_pred - y_true) + K.square(y_true - x_true))
    return loss

然后我使用

编译模型
model_custom.compile(loss = custom_loss( x_true=training_data ), optimizer='adam')

使用...拟合模型

model_custom.fit(training_data, training_label, epochs=100, batch_size = training_data.shape[0])

以上所有方法都可以正常工作,因为批次大小实际上是所有训练样本的数量。

但是如果我有1000个训练样本时设置了不同的batch_size(例如10),则会出现错误

  

不兼容的形状:[1000]对[10]。

Keras似乎能够根据批次大小自动将输入的大小调整为其自身的损失函数,但对于自定义损失函数则无法这样做。

您知道如何解决此问题吗?

谢谢!

================================================ =========================

*更新:批处理大小问题已解决,但又发生了另一个问题

感谢Ori,建议将输入和输出层连接起来!从某种意义上说,代码可以在任何批处理量下运行,因此它可以“工作”。但是,训练新模型的结果似乎是错误的...下面是演示该问题的简化代码:

import numpy as np
import scipy.io
import keras
from keras import backend as K
from keras.models import Model
from keras.layers import Input, Dense, Activation
from numpy.random import seed
from tensorflow import set_random_seed

def custom_loss(y_true, y_pred): # this is essentially the mean_square_error
    mse = K.mean( K.square( y_pred[:,2] - y_true ) )
    return mse

# set the seeds so that we get the same initialization across different trials
seed_numpy = 0
seed_tensorflow = 0

# generate data of x = [ y^3 y^2 ]
y = np.random.rand(5000+1000,1) * 2 # generate 5000 training and 1000 testing samples
x = np.concatenate( ( np.power(y, 3) , np.power(y, 2) ) , axis=1 )

training_data  = x[0:5000:1,:]
training_label = y[0:5000:1]
testing_data   = x[5000:6000:1,:]
testing_label  = y[5000:6000:1]

# build the standard neural network with one hidden layer
seed(seed_numpy)
set_random_seed(seed_tensorflow)

input_standard = Input(shape=(2,))                                               # input
hidden_standard = Dense(10, activation='relu', input_shape=(2,))(input_standard) # hidden layer
output_standard = Dense(1, activation='linear')(hidden_standard)                 # output layer

model_standard = Model(inputs=[input_standard], outputs=[output_standard])     # build the model
model_standard.compile(loss='mean_squared_error', optimizer='adam')            # compile the model
model_standard.fit(training_data, training_label, epochs=50, batch_size = 500) # train the model
testing_label_pred_standard = model_standard.predict(testing_data)             # make prediction

# get the mean squared error
mse_standard = np.sum( np.power( testing_label_pred_standard - testing_label , 2 ) ) / 1000

# build the neural network with the custom loss
seed(seed_numpy)
set_random_seed(seed_tensorflow)

input_custom = Input(shape=(2,))                                             # input
hidden_custom = Dense(10, activation='relu', input_shape=(2,))(input_custom) # hidden layer
output_custom_temp = Dense(1, activation='linear')(hidden_custom)            # output layer
output_custom = keras.layers.concatenate([input_custom, output_custom_temp])

model_custom = Model(inputs=[input_custom], outputs=[output_custom])         # build the model
model_custom.compile(loss = custom_loss, optimizer='adam')                   # compile the model
model_custom.fit(training_data, training_label, epochs=50, batch_size = 500) # train the model
testing_label_pred_custom = model_custom.predict(testing_data)               # make prediction

# get the mean squared error
mse_custom = np.sum( np.power( testing_label_pred_custom[:,2:3:1] - testing_label , 2 ) ) / 1000

# compare the result
print( [ mse_standard , mse_custom ] )

基本上,我有一个标准的一层隐藏神经网络和一个自定义的一层隐藏神经网络,其输出层与输入层串联在一起。出于测试目的,我没有在自定义损失函数中使用串联的输入层,因为我想查看自定义网络是否可以重现标准神经网络。由于自定义损失函数等效于标准的“ mean_squared_error”损失,因此两个网络应具有相同的训练结果(我还重置了随机种子以确保它们具有相同的初始化)。

但是,培训结果却大不相同。似乎串联使训练过程有所不同?有什么想法吗?

再次感谢您的帮助!

最终更新:Ori的连接输入和输出层的方法有效,并且已使用生成器进行了验证。谢谢!!

1 个答案:

答案 0 :(得分:0)

问题在于,在编译模型时,将x_true设置为所有样本大小的静态张量。尽管用于keras损失函数的输入是y_true和y_pred,但每个函数的大小均为[batch_size, :]

正如我所见,您可以解决2个问题,第一个是使用生成器来创建批次,这样您就可以控制每次评估哪些索引以及损失函数您可以对x_true张量进行切片以适合要评估的样本:

def custom_loss(x_true)
    def loss(y_true, y_pred):
        x_true_samples = relevant_samples(x_true)
        return K.mean(K.square(y_pred - y_true) + K.square(y_true - x_true_samples))
    return loss

此解决方案可能很复杂,我建议采用一种更简单的解决方法-
将输入层与输出层连接起来,这样您的新输出将采用original_output , input的形式。

现在,您可以使用新的修改后的损失函数:

def loss(y_true, y_pred):
    return K.mean(K.square(y_pred[:,:output_shape] - y_true[:,:output_shape]) +
                  K.square(y_true[:,:output_shape] - y_pred[:,outputshape:))

现在,您的新损失函数将同时考虑输入数据和预测。

修改:
请注意,当您设置种子时,您的模型并不完全相同,并且因为您没有使用生成器,所以让keras选择批次,对于不同的模型,他可能会选择不同的样本。
由于您的模型无法收敛,因此不同的样本可能导致不同的结果。

我在代码中添加了一个生成器,以验证我们选择用于训练的样本,现在您可以看到两个结果相同:

def custom_loss(y_true, y_pred): # this is essentially the mean_square_error
    mse = keras.losses.mean_squared_error(y_true, y_pred[:,2])
    return mse


def generator(x, y, batch_size):
    curIndex = 0
    batch_x = np.zeros((batch_size,2))
    batch_y = np.zeros((batch_size,1))
    while True:
        for i in range(batch_size):            
            batch_x[i] = x[curIndex,:]
            batch_y[i] = y[curIndex,:]
            i += 1;
            if i == 5000:
                i = 0
        yield batch_x, batch_y

# set the seeds so that we get the same initialization across different trials
seed_numpy = 0
seed_tensorflow = 0

# generate data of x = [ y^3 y^2 ]
y = np.random.rand(5000+1000,1) * 2 # generate 5000 training and 1000 testing samples
x = np.concatenate( ( np.power(y, 3) , np.power(y, 2) ) , axis=1 )

training_data  = x[0:5000:1,:]
training_label = y[0:5000:1]
testing_data   = x[5000:6000:1,:]
testing_label  = y[5000:6000:1]

batch_size = 32



# build the standard neural network with one hidden layer
seed(seed_numpy)
set_random_seed(seed_tensorflow)

input_standard = Input(shape=(2,))                                               # input
hidden_standard = Dense(10, activation='relu', input_shape=(2,))(input_standard) # hidden layer
output_standard = Dense(1, activation='linear')(hidden_standard)                 # output layer

model_standard = Model(inputs=[input_standard], outputs=[output_standard])     # build the model
model_standard.compile(loss='mse', optimizer='adam')            # compile the model
#model_standard.fit(training_data, training_label, epochs=50, batch_size = 10) # train the model
model_standard.fit_generator(generator(training_data,training_label,batch_size),  steps_per_epoch= 32, epochs= 100)
testing_label_pred_standard = model_standard.predict(testing_data)             # make prediction

# get the mean squared error
mse_standard = np.sum( np.power( testing_label_pred_standard - testing_label , 2 ) ) / 1000

# build the neural network with the custom loss
seed(seed_numpy)
set_random_seed(seed_tensorflow)


input_custom = Input(shape=(2,))                                               # input
hidden_custom = Dense(10, activation='relu', input_shape=(2,))(input_custom) # hidden layer
output_custom_temp = Dense(1, activation='linear')(hidden_custom)            # output layer
output_custom = keras.layers.concatenate([input_custom, output_custom_temp])

model_custom = Model(inputs=input_custom, outputs=output_custom)         # build the model
model_custom.compile(loss = custom_loss, optimizer='adam')                   # compile the model
#model_custom.fit(training_data, training_label, epochs=50, batch_size = 10) # train the model
model_custom.fit_generator(generator(training_data,training_label,batch_size),  steps_per_epoch= 32, epochs= 100)
testing_label_pred_custom = model_custom.predict(testing_data)

# get the mean squared error
mse_custom = np.sum( np.power( testing_label_pred_custom[:,2:3:1] - testing_label , 2 ) ) / 1000

# compare the result
print( [ mse_standard , mse_custom ] )