keras历史记录对象中包含的指标值的定义是什么?

时间:2018-09-26 15:07:58

标签: python keras

文档指出model.fit将返回History对象,该对象包含在培训期间评估的各种指标。在训练过程中,这些指标也会打印到标准输出上(例如,请参见this question)。

文档指出历史对象是

  

在连续时期[...]

的训练损失值和度量值的记录

现在,我想知道这些指标是作为每个样品的平均值还是每批次的平均值?假设我有model.fit(x, y, batch_size=16, ...)。是否提供了在批次内累积并平均的度量(即,一个值将对应于批次中16个样本的组合值)?还是按样本给出(即在整个数据集中取平均值)?

编辑

显然,度量标准不是每个样本,而是每个输出model.fit的文档粗略地指出了这一点;也就是说,它指出,如果为每个输出节点指定一个不同的损耗,则总损耗将最小化。这表明两件事:首先,不是按样本计算损失(度量标准),而是按每个输出计算(尽管是批次内和批次内的平均值)。如果将每个输出的损耗(度量)平均到各个输出上,则此过程将类似于每个样本的计算。但是,第二,该文档表明,不同产出的损失求和而不是平均。因此,这需要更多调查。

深入研究源代码可以发现loss functions are stored per output确实如此。如果我们没有为a weight of one will be assigned by default手动为各个输出指定任何权重。然后是相关的损失计算部分starts here。损失are summed,似乎没有取平均值。好吧,我们应该从一个快速实验中看到这一点:

from keras.initializers import Ones, Zeros
from keras.models import Sequential
from keras.layers import Dense
import numpy as np

x = np.arange(16).reshape(8, 2).astype(float)
y = np.zeros((8, 2), dtype=float)

model = Sequential()
model.add(Dense(2, input_dim=2, kernel_initializer=Ones(), bias_initializer=Zeros(), trainable=False))
model.compile('sgd', loss='mean_absolute_error', metrics=['mean_absolute_error', 'mean_squared_error'])

# Metrics per sample and output.
ae = np.abs(np.sum(x, axis=1)[:, None] - y)  # Absolute error.
se = (np.sum(x, axis=1)[:, None] - y)**2  # Squared error.
print('Expected metrics for averaging over samples but summing over outputs:')
print(f'\tMAE: {np.sum(np.mean(ae, axis=0))}, MSE: {np.sum(np.mean(se, axis=0))}', end='\n\n')
print('Expected metrics for averaging over samples and averaging over outputs:')
print(f'\tMAE: {np.mean(np.mean(ae, axis=0))}, MSE: {np.mean(np.mean(se, axis=0))}')

for batch_size in [1, 2, 4, 8]:
    print(f'\n# Batch size: {batch_size}')
    model.fit(x, y, batch_size=batch_size, epochs=1, shuffle=False)

哪个会产生以下输出:

Expected metrics for averaging over samples but summing over outputs:
    MAE: 30.0, MSE: 618.0

Expected metrics for averaging over samples and averaging over outputs:
    MAE: 15.0, MSE: 309.0

# Batch size: 1
Epoch 1/1
8/8 [==============================] - 0s 4ms/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000

# Batch size: 2
Epoch 1/1
8/8 [==============================] - 0s 252us/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000

# Batch size: 4
Epoch 1/1
8/8 [==============================] - 0s 117us/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000

# Batch size: 8
Epoch 1/1
8/8 [==============================] - 0s 60us/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000

奇怪的是,报告的度量值似乎是在输出中平均的,而文档和源代码表明它们将被求和。如果有人可以澄清这里的情况,我将感到非常高兴。

1 个答案:

答案 0 :(得分:1)

为简化问题,让我们定义一个“模型”,该模型按原样返回输入。

from keras.layers import Input
from keras.models import Model

inp = Input((2,))
model = Model(inputs=inp, outputs=inp)
model.summary()

#__________________________________________________________________
Layer (type)                 Output Shape              Param #   
#=================================================================
#input_3 (InputLayer)         (None, 2)                 0         
#=================================================================
#Total params: 0
#Trainable params: 0
#Non-trainable params: 0
#__________________________________________________________________

尽管没有要训练的参数,但让我们训练模型以查看keras如何计算指标。

import numpy as np
x = np.arange(16).reshape(8, 2).astype(float)
y = np.zeros((8, 2), dtype=float)

model.compile(optimizer="adam", loss="mse", metrics=["mae"])

for bs in [1, 2, 3, 8]:
    print("Training with batch size", bs)
    model.fit(x, y, epochs=1, batch_size=bs)
    print("")

我得到:

Training with batch size 1
Epoch 1/1
8/8 [=============] - 0s 10ms/step - loss: 77.5000 - mean_absolute_error: 7.5000

Training with batch size 2
Epoch 1/1
8/8 [=============] - 0s 1ms/step - loss: 77.5000 - mean_absolute_error: 7.5000

Training with batch size 3
Epoch 1/1
8/8 [=============] - 0s 806us/step - loss: 77.5000 - mean_absolute_error: 7.5000

Training with batch size 8
Epoch 1/1
8/8 [=============] - 0s 154us/step - loss: 77.5000 - mean_absolute_error: 7.5000

因此MSE (loss) = 77.5MAE = 7.5,无论批次大小如何。

要复制结果,我们可以:

np.mean((x - y) ** 2)
# 77.5
np.mean(np.abs(x - y))
# 7.5

现在,关于keras文档中的“加权总和”语句,这是关于输出的列表,而不是关于多列输出。

from keras.layers import Input, Lambda
from keras.models import Model

inp = Input((2,))
y1 = Lambda(lambda x: x[:, 0:1], name="Y1")(inp)
y2 = Lambda(lambda x: x[:, 1:2], name="Y2")(inp)
model = Model(inputs=inp, outputs=[y1, y2])
model.summary()

#_____________________________________________________________________
#Layer (type)          Output Shape         Param #     Connected to                     
#=====================================================================
#input_6 (InputLayer)  (None, 2)            0                                            
#_____________________________________________________________________
#Y1 (Lambda)           (None, 1)            0           input_6[0][0]                    
#_____________________________________________________________________
#Y2 (Lambda)           (None, 1)            0           input_6[0][0]                    
#=====================================================================
#Total params: 0
#Trainable params: 0
#Non-trainable params: 0

此模型与上面的模型完全相同,只是输出被分为两部分。

培训结果如下。

model.compile(optimizer="adam", loss="mse", metrics=["mae"])
for bs in [1, 2, 3, 8]:
    print("Training with batch size", bs)
    model.fit(x, [y[:, 0:1], y[:, 1:2]], epochs=1, batch_size=bs)
    print("")

#Training with batch size 1
#Epoch 1/1
#8/8 [==============================] - 0s 15ms/step - loss: 155.0000 -
#Y1_loss: 70.0000 - Y2_loss: 85.0000 - Y1_mean_absolute_error: 7.0000 -
#Y2_mean_absolute_error: 8.0000
# 
#same for all batch sizes

Keras现在分别计算每个输出的损耗,然后取它们的 sum 。 我们可以通过复制结果

np.mean(np.sum((x - y) ** 2, axis=-1))
# 155.0
np.mean(np.sum(np.abs(x - y), axis=-1))
# 15.0 (= 7.0 + 8.0)