文档指出model.fit
将返回History
对象,该对象包含在培训期间评估的各种指标。在训练过程中,这些指标也会打印到标准输出上(例如,请参见this question)。
文档指出历史对象是
在连续时期[...]
的训练损失值和度量值的记录
现在,我想知道这些指标是作为每个样品的平均值还是每批次的平均值?假设我有model.fit(x, y, batch_size=16, ...)
。是否提供了在批次内累积并平均的度量(即,一个值将对应于批次中16个样本的组合值)?还是按样本给出(即在整个数据集中取平均值)?
显然,度量标准不是每个样本,而是每个输出。 model.fit
的文档粗略地指出了这一点;也就是说,它指出,如果为每个输出节点指定一个不同的损耗,则总损耗将最小化。这表明两件事:首先,不是按样本计算损失(度量标准),而是按每个输出计算(尽管是批次内和批次内的平均值)。如果将每个输出的损耗(度量)平均到各个输出上,则此过程将类似于每个样本的计算。但是,第二,该文档表明,不同产出的损失求和而不是平均。因此,这需要更多调查。
深入研究源代码可以发现loss functions are stored per output确实如此。如果我们没有为a weight of one will be assigned by default手动为各个输出指定任何权重。然后是相关的损失计算部分starts here。损失are summed,似乎没有取平均值。好吧,我们应该从一个快速实验中看到这一点:
from keras.initializers import Ones, Zeros
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
x = np.arange(16).reshape(8, 2).astype(float)
y = np.zeros((8, 2), dtype=float)
model = Sequential()
model.add(Dense(2, input_dim=2, kernel_initializer=Ones(), bias_initializer=Zeros(), trainable=False))
model.compile('sgd', loss='mean_absolute_error', metrics=['mean_absolute_error', 'mean_squared_error'])
# Metrics per sample and output.
ae = np.abs(np.sum(x, axis=1)[:, None] - y) # Absolute error.
se = (np.sum(x, axis=1)[:, None] - y)**2 # Squared error.
print('Expected metrics for averaging over samples but summing over outputs:')
print(f'\tMAE: {np.sum(np.mean(ae, axis=0))}, MSE: {np.sum(np.mean(se, axis=0))}', end='\n\n')
print('Expected metrics for averaging over samples and averaging over outputs:')
print(f'\tMAE: {np.mean(np.mean(ae, axis=0))}, MSE: {np.mean(np.mean(se, axis=0))}')
for batch_size in [1, 2, 4, 8]:
print(f'\n# Batch size: {batch_size}')
model.fit(x, y, batch_size=batch_size, epochs=1, shuffle=False)
哪个会产生以下输出:
Expected metrics for averaging over samples but summing over outputs:
MAE: 30.0, MSE: 618.0
Expected metrics for averaging over samples and averaging over outputs:
MAE: 15.0, MSE: 309.0
# Batch size: 1
Epoch 1/1
8/8 [==============================] - 0s 4ms/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000
# Batch size: 2
Epoch 1/1
8/8 [==============================] - 0s 252us/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000
# Batch size: 4
Epoch 1/1
8/8 [==============================] - 0s 117us/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000
# Batch size: 8
Epoch 1/1
8/8 [==============================] - 0s 60us/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000
奇怪的是,报告的度量值似乎是在输出中平均的,而文档和源代码表明它们将被求和。如果有人可以澄清这里的情况,我将感到非常高兴。
答案 0 :(得分:1)
为简化问题,让我们定义一个“模型”,该模型按原样返回输入。
from keras.layers import Input
from keras.models import Model
inp = Input((2,))
model = Model(inputs=inp, outputs=inp)
model.summary()
#__________________________________________________________________
Layer (type) Output Shape Param #
#=================================================================
#input_3 (InputLayer) (None, 2) 0
#=================================================================
#Total params: 0
#Trainable params: 0
#Non-trainable params: 0
#__________________________________________________________________
尽管没有要训练的参数,但让我们训练模型以查看keras如何计算指标。
import numpy as np
x = np.arange(16).reshape(8, 2).astype(float)
y = np.zeros((8, 2), dtype=float)
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
for bs in [1, 2, 3, 8]:
print("Training with batch size", bs)
model.fit(x, y, epochs=1, batch_size=bs)
print("")
我得到:
Training with batch size 1
Epoch 1/1
8/8 [=============] - 0s 10ms/step - loss: 77.5000 - mean_absolute_error: 7.5000
Training with batch size 2
Epoch 1/1
8/8 [=============] - 0s 1ms/step - loss: 77.5000 - mean_absolute_error: 7.5000
Training with batch size 3
Epoch 1/1
8/8 [=============] - 0s 806us/step - loss: 77.5000 - mean_absolute_error: 7.5000
Training with batch size 8
Epoch 1/1
8/8 [=============] - 0s 154us/step - loss: 77.5000 - mean_absolute_error: 7.5000
因此MSE (loss) = 77.5
和MAE = 7.5
,无论批次大小如何。
要复制结果,我们可以:
np.mean((x - y) ** 2)
# 77.5
np.mean(np.abs(x - y))
# 7.5
现在,关于keras文档中的“加权总和”语句,这是关于输出的列表,而不是关于多列输出。
from keras.layers import Input, Lambda
from keras.models import Model
inp = Input((2,))
y1 = Lambda(lambda x: x[:, 0:1], name="Y1")(inp)
y2 = Lambda(lambda x: x[:, 1:2], name="Y2")(inp)
model = Model(inputs=inp, outputs=[y1, y2])
model.summary()
#_____________________________________________________________________
#Layer (type) Output Shape Param # Connected to
#=====================================================================
#input_6 (InputLayer) (None, 2) 0
#_____________________________________________________________________
#Y1 (Lambda) (None, 1) 0 input_6[0][0]
#_____________________________________________________________________
#Y2 (Lambda) (None, 1) 0 input_6[0][0]
#=====================================================================
#Total params: 0
#Trainable params: 0
#Non-trainable params: 0
此模型与上面的模型完全相同,只是输出被分为两部分。
培训结果如下。
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
for bs in [1, 2, 3, 8]:
print("Training with batch size", bs)
model.fit(x, [y[:, 0:1], y[:, 1:2]], epochs=1, batch_size=bs)
print("")
#Training with batch size 1
#Epoch 1/1
#8/8 [==============================] - 0s 15ms/step - loss: 155.0000 -
#Y1_loss: 70.0000 - Y2_loss: 85.0000 - Y1_mean_absolute_error: 7.0000 -
#Y2_mean_absolute_error: 8.0000
#
#same for all batch sizes
Keras现在分别计算每个输出的损耗,然后取它们的 sum 。 我们可以通过复制结果
np.mean(np.sum((x - y) ** 2, axis=-1))
# 155.0
np.mean(np.sum(np.abs(x - y), axis=-1))
# 15.0 (= 7.0 + 8.0)