基金会

Question

我正在研究Linear Regression with Synthetic Data Colab exercise，该文件探讨了玩具数据集的线性回归。建立并训练了一种线性回归模型，可以对学习率，时期和批次大小进行调整。我很难理解如何精确地完成迭代，以及如何将其连接到“时代”和“批量”。我基本上不了解如何训练实际模型，如何处理数据和完成迭代。为了理解这一点，我想通过手动计算每个步骤来遵循。因此，我想对每一步都有一个斜率和截距系数。这样我就可以看到“计算机”使用什么样的数据，将其放入模型中，在每个特定的迭代中产生什么样的模型以及如何进行迭代。我首先尝试获取每个步骤的斜率和截距，但是失败了，因为仅在最后才输出斜率和截距。我修改的代码（原始的，刚刚添加了：）

  print("Slope")
  print(trained_weight)
  print("Intercept")
  print(trained_bias)

代码：

import pandas as pd
import tensorflow as tf
from matplotlib import pyplot as plt

#@title Define the functions that build and train a model
def build_model(my_learning_rate):
  """Create and compile a simple linear regression model."""
  # Most simple tf.keras models are sequential. 
  # A sequential model contains one or more layers.
  model = tf.keras.models.Sequential()

  # Describe the topography of the model.
  # The topography of a simple linear regression model
  # is a single node in a single layer. 
  model.add(tf.keras.layers.Dense(units=1, 
                                  input_shape=(1,)))

  # Compile the model topography into code that 
  # TensorFlow can efficiently execute. Configure 
  # training to minimize the model's mean squared error. 
  model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=my_learning_rate),
                loss="mean_squared_error",
                metrics=[tf.keras.metrics.RootMeanSquaredError()])
 
  return model           


def train_model(model, feature, label, epochs, batch_size):
  """Train the model by feeding it data."""

  # Feed the feature values and the label values to the 
  # model. The model will train for the specified number 
  # of epochs, gradually learning how the feature values
  # relate to the label values. 
  history = model.fit(x=feature,
                      y=label,
                      batch_size=batch_size,
                      epochs=epochs)

  # Gather the trained model's weight and bias.
  trained_weight = model.get_weights()[0]
  trained_bias = model.get_weights()[1]
  print("Slope")
  print(trained_weight)
  print("Intercept")
  print(trained_bias)
  # The list of epochs is stored separately from the 
  # rest of history.
  epochs = history.epoch

  # Gather the history (a snapshot) of each epoch.
  hist = pd.DataFrame(history.history)

 # print(hist)
  # Specifically gather the model's root mean 
  #squared error at each epoch. 
  rmse = hist["root_mean_squared_error"]

  return trained_weight, trained_bias, epochs, rmse

print("Defined create_model and train_model")

#@title Define the plotting functions
def plot_the_model(trained_weight, trained_bias, feature, label):
  """Plot the trained model against the training feature and label."""

  # Label the axes.
  plt.xlabel("feature")
  plt.ylabel("label")

  # Plot the feature values vs. label values.
  plt.scatter(feature, label)

  # Create a red line representing the model. The red line starts
  # at coordinates (x0, y0) and ends at coordinates (x1, y1).
  x0 = 0
  y0 = trained_bias
  x1 = my_feature[-1]
  y1 = trained_bias + (trained_weight * x1)
  plt.plot([x0, x1], [y0, y1], c='r')

  # Render the scatter plot and the red line.
  plt.show()

def plot_the_loss_curve(epochs, rmse):
  """Plot the loss curve, which shows loss vs. epoch."""

  plt.figure()
  plt.xlabel("Epoch")
  plt.ylabel("Root Mean Squared Error")

  plt.plot(epochs, rmse, label="Loss")
  plt.legend()
  plt.ylim([rmse.min()*0.97, rmse.max()])
  plt.show()

print("Defined the plot_the_model and plot_the_loss_curve functions.")

my_feature = ([1.0, 2.0,  3.0,  4.0,  5.0,  6.0,  7.0,  8.0,  9.0, 10.0, 11.0, 12.0])
my_label   = ([5.0, 8.8,  9.6, 14.2, 18.8, 19.5, 21.4, 26.8, 28.9, 32.0, 33.8, 38.2])

learning_rate=0.05
epochs=1
my_batch_size=12

my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature, 
                                                         my_label, epochs,
                                                         my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)

在我的特定情况下，我的输出是：

现在，我尝试将其复制到一个简单的Excel工作表中，并手动计算rmse：

但是，我得到21.8，而不是23.1？而且我的损失不是535.48，而是476.82

因此，我的第一个问题是：我的错误在哪里，rmse是如何计算的？

第二个问题：如何获得每次特定迭代的均方根值？让我们考虑纪元为4，批处理大小为4。

给出4个时期和3个批次，每个例子有4个（观察）。我不了解这些迭代如何训练模型。那么如何获得每个回归模型和均方根的系数呢？不仅对于每个时期（如此4），而且对于每个迭代。我认为每个时期都有3次迭代。因此，我总共认为会得出12个线性回归模型？我希望看到这12个模型。在没有给出任何信息的情况下，起点使用的初始值是什么？使用哪种斜率和截距？从真正的第一点开始。我没有具体说明。然后，我希望能够了解每个步骤如何调整坡度和截距。我认为这将来自梯度下降算法。但这将是超级加分。对我而言，更重要的是首先了解这些迭代是如何完成的以及它们如何连接到纪元和批次。

更新：我知道初始值（用于斜率和截距）是随机选择的。

Answer 1

我尝试了一下，并且我认为它像这样工作：

计算和打印第一批的损失和指标，并更新权重和偏差。
对时代中的所有批次重复执行步骤2。但是，在最后一次批次丢失和度量之后，不会打印出来，因此您在屏幕上看到的是在时代中最后一次更新之前的损失和度量。
新纪元开始了，您看到的第一个度量和损失实际上是根据先前纪元的最后更新权重计算出来的...

所以基本上我认为从直觉上可以告诉我们，首先计算损失，然后更新权重，这意味着权重更新是时代中的最后操作。

如果使用一个时期和一批来训练模型，那么您在屏幕上看到的是根据初始权重和偏差计算的损失。如果要在每个时期结束后查看损失和指标（具有最“实际”权重），可以将参数validation_data=(X,y)传递给fit方法。这就告诉算法，当纪元完成时，算法会根据给定的验证数据再次计算损失和指标。

关于模型的初始权重，可以在为图层手动设置一些初始权重（使用kernel_initializer参数）时进行尝试：

  model.add(tf.keras.layers.Dense(units=1,
                                  input_shape=(1,),
                                  kernel_initializer=tf.constant_initializer(.5)))

这是train_model函数的更新部分，它显示了我的意思：

  def train_model(model, feature, label, epochs, batch_size):
        """Train the model by feeding it data."""

        # Feed the feature values and the label values to the
        # model. The model will train for the specified number
        # of epochs, gradually learning how the feature values
        # relate to the label values.
        init_slope = model.get_weights()[0][0][0]
        init_bias = model.get_weights()[1][0]
        print('init slope is {}'.format(init_slope))
        print('init bias is {}'.format(init_bias))

        history = model.fit(x=feature,
                          y=label,
                          batch_size=batch_size,
                          epochs=epochs,
                          validation_data=(feature,label))

        # Gather the trained model's weight and bias.
        #print(model.get_weights())
        trained_weight = model.get_weights()[0]
        trained_bias = model.get_weights()[1]
        print("Slope")
        print(trained_weight)
        print("Intercept")
        print(trained_bias)
        # The list of epochs is stored separately from the
        # rest of history.
        prediction_manual = [trained_weight[0][0]*i + trained_bias[0] for i in feature]

        manual_loss = np.mean(((np.array(label)-np.array(prediction_manual))**2))
        print('manually computed loss after slope and bias update is {}'.format(manual_loss))
        print('manually computed rmse after slope and bias update is {}'.format(manual_loss**(1/2)))

        prediction_manual_init = [init_slope*i + init_bias for i in feature]
        manual_loss_init = np.mean(((np.array(label)-np.array(prediction_manual_init))**2))
        print('manually computed loss with init slope and bias is {}'.format(manual_loss_init))
        print('manually copmuted loss with init slope and bias is {}'.format(manual_loss_init**(1/2)))

输出：

"""
init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 402.9850 - root_mean_squared_error: 20.0745 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
Slope
[[0.65811384]]
Intercept
[0.15811387]
manually computed loss after slope and bias update is 352.3350379264957
manually computed rmse after slope and bias update is 18.77058970641295
manually computed loss with init slope and bias is 402.98499999999996
manually copmuted loss with init slope and bias is 20.074486294797182
"""

请注意，斜率和偏差更新后的手动计算的损失和指标与验证损失和指标匹配，更新前的人工计算的损失和指标与初始斜率和偏差的损耗和指标匹配。

关于第二个问题，我认为您可以将数据手动拆分为多个批次，然后遍历每个批次并适合它们。然后，在每次迭代中，模型都会打印损失和指标以用于验证数据。像这样：

  init_slope = model.get_weights()[0][0][0]
  init_bias = model.get_weights()[1][0]
  print('init slope is {}'.format(init_slope))
  print('init bias is {}'.format(init_bias))
  batch_size = 3

  for idx in range(0,len(feature),batch_size):
      model.fit(x=feature[idx:idx+batch_size],
                y=label[idx:idx+batch_size],
                batch_size=1000,
                epochs=epochs,
                validation_data=(feature,label))
      print('slope: {}'.format(model.get_weights()[0][0][0]))
      print('intercept: {}'.format(model.get_weights()[1][0]))
      print('x data used: {}'.format(feature[idx:idx+batch_size]))
      print('y data used: {}'.format(label[idx:idx+batch_size]))

输出：

init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 48.9000 - root_mean_squared_error: 6.9929 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
slope: 0.6581138372421265
intercept: 0.15811386704444885
x data used: [1.0, 2.0, 3.0]
y data used: [5.0, 8.8, 9.6]
1/1 [==============================] - 0s 21ms/step - loss: 200.9296 - root_mean_squared_error: 14.1750 - val_loss: 306.3082 - val_root_mean_squared_error: 17.5017
slope: 0.8132714033126831
intercept: 0.3018075227737427
x data used: [4.0, 5.0, 6.0]
y data used: [14.2, 18.8, 19.5]
1/1 [==============================] - 0s 22ms/step - loss: 363.2630 - root_mean_squared_error: 19.0595 - val_loss: 266.7119 - val_root_mean_squared_error: 16.3313
slope: 0.9573485255241394
intercept: 0.42669767141342163
x data used: [7.0, 8.0, 9.0]
y data used: [21.4, 26.8, 28.9]
1/1 [==============================] - 0s 22ms/step - loss: 565.5593 - root_mean_squared_error: 23.7815 - val_loss: 232.1553 - val_root_mean_squared_error: 15.2366
slope: 1.0924618244171143
intercept: 0.5409283638000488
x data used: [10.0, 11.0, 12.0]
y data used: [32.0, 33.8, 38.2]

Answer 2

基金会

问题陈述

让我们考虑一组样本X的线性回归模型，其中每个样本都由一个特征x表示。作为模型训练的一部分，我们正在搜索w.x + b行，以使((w.x+b) -y )^2（平方损失）最小。对于一组数据点，我们采用每个样本的均方损失平均值，即所谓的均方误差（MSE）。表示权重和偏差的w和b一起称为权重。

拟合线/训练模型

我们有一个用于解决线性回归问题的封闭式解决方案，(X^T.X)^-1.X^T.y
我们还可以使用梯度体面方法来搜索权重，以最小化平方损失。 tensorflow，pytorch之类的框架使用体面的梯度来搜索权重（称为训练）。

体面的渐变

用于学习回归的梯度体面算法看起来像打击

w, b = some initial value
While model has not converged:
    y_hat = w.X + b
    error = MSE(y, y_hat) 
    back propagate (BPP) error and adjust weights

以上循环的每次运行都称为一个时期。但是，由于资源限制，y_hat，error的计算并不对完整数据集执行BPP，而是将数据分为较小的批次，并且一次对一个批次执行上述操作。此外，我们通常会确定时期数，并监控模型是否收敛。

w, b = some initial value
for i in range(number_of_epochs)
    for X_batch,y_batch in get_next_batch(X, y)
        y_hat = w.X_batch + b
        error = MSE(y_batch, y_hat) 
    back propagate (BPP) error and adjust weights

Keras批量执行

让我们说我们想添加均方根误差以跟踪模型训练时的性能。 Keras的实现方式如下

w, b = some initial value
for i in range(number_of_epochs)
    all_y_hats = []
    all_ys = []
    for X_batch,y_batch in get_next_batch(X, y)
        y_hat = w.X_batch + b
        error = MSE(y_batch, y_hat)

        all_y_hats.extend(y_hat) 
        all_ys.extend(y_batch)

        batch_rms_error = RMSE(all_ys, all_y_hats)

    back propagate (BPP) error and adjust weights

如您在上面看到的，预测是累加的，RMSE是根据累加的预测计算的，而不是取所有先前批次RMSE的平均值。

在喀拉拉邦的实现

现在我们的基础已经明确，让我们看看如何在keras中实现对相同对象的跟踪。 keras有回调，因此我们可以加入on_batch_begin回调并累积all_y_hats和all_ys。在on_batch_end回调中，keras给我们计算出的RMSE。我们将使用累积的RMSE和all_y_hats手动计算all_ys，并验证它是否与计算出的keras相同。我们还将保存权重，以便以后可以绘制正在学习的线。

import numpy as np
from sklearn.metrics import mean_squared_error
import keras
import matplotlib.pyplot as plt

# Some training data
X = np.arange(16)
y = 0.5*X +0.2

batch_size = 8
all_y_hats = []
learned_weights = [] 

class CustomCallback(keras.callbacks.Callback):
  def on_batch_begin(self, batch, logs={}):    
    w = self.model.layers[0].weights[0].numpy()[0][0]
    b = self.model.layers[0].weights[1].numpy()[0]    
    s = batch*batch_size
    all_y_hats.extend(b + w*X[s:s+batch_size])    
    learned_weights.append([w,b])

  def on_batch_end(self, batch, logs={}):    
    calculated_error = np.sqrt(mean_squared_error(all_y_hats, y[:len(all_y_hats)]))
    print (f"\n Calculated: {calculated_error},  Actual: {logs['root_mean_squared_error']}")
    assert np.isclose(calculated_error, logs['root_mean_squared_error'])

  def on_epoch_end(self, batch, logs={}):
    del all_y_hats[:]    


model = keras.models.Sequential()
model.add(keras.layers.Dense(1, input_shape=(1,)))
model.compile(optimizer=keras.optimizers.RMSprop(lr=0.01), loss="mean_squared_error",  metrics=[keras.metrics.RootMeanSquaredError()])
# We should set shuffle=False so that we know how baches are divided
history = model.fit(X,y, epochs=100, callbacks=[CustomCallback()], batch_size=batch_size, shuffle=False)

输出：

Epoch 1/100
 8/16 [==============>...............] - ETA: 0s - loss: 16.5132 - root_mean_squared_error: 4.0636
 Calculated: 4.063645694548688,  Actual: 4.063645839691162

 Calculated: 8.10112834945773,  Actual: 8.101128578186035
16/16 [==============================] - 0s 3ms/step - loss: 65.6283 - root_mean_squared_error: 8.1011
Epoch 2/100
 8/16 [==============>...............] - ETA: 0s - loss: 14.0454 - root_mean_squared_error: 3.7477
 Calculated: 3.7477213352845675,  Actual: 3.7477214336395264
-------------- truncated -----------------------

Ta-da！断言assert np.isclose(calculated_error, logs['root_mean_squared_error'])从未失败，因此我们的计算/理解是正确的。

线路

最后，让我们绘制基于均方误差损失的BPP算法正在调整的线。我们可以使用下面的代码创建每批正在学习的直线的png图像以及火车数据。

for i, (w,b) in enumerate(learned_weights):
  plt.close()
  plt.axis([-1, 18, -1, 10])
  plt.scatter(X, y)
  plt.plot([-1,17], [-1*w+b, 17*w+b], color='green')
  plt.savefig(f'img{i+1}.png')

下面是按照学习顺序排列的上述图像的gif动画。

在y = 0.5*X +5.2时学习到的超平面（在这种情况下为线）

Answer 3

线性回归模型

线性回归模型只有一个具有线性激活功能的神经元。关于训练模型的基础是我们使用渐变下降。每次将整个数据传递通过模型并更新权重时，它称为 1个时期。但是，迭代和纪元的概念在这里没有什么不同。

基本培训步骤：

Prepare data
Initialize the model and its parameters (weights and biases)
for each epoch:  #(both iteration and epoch same here)
    Forward Propagation
    Compute Cost
    Back Propagation
    Update Parameters

梯度下降具有三个变体：

批次梯度下降（BDG）
随机梯度下降（SDG）
小批量梯度下降（MDG）

批处理梯度下降是我们之前提到的（传递整个数据）。通常也称为“梯度下降”。

在随机梯度下降中，我们一次传递一个随机示例，并且每传递一个示例都会更新权重。现在迭代开始了。通过1个示例完成模型训练后， 1次迭代完成。但是，数据集中还有更多示例尚未见到该模型。完全训练所有这些示例称为 1个时代。由于一次只能传递一个示例，因此SDG对于较大的数据集来说非常慢，因为它失去了矢量化的效果。

因此，我们通常使用小批量梯度下降。在这里，数据集被分为多个固定大小的块。每个数据块的大小称为批处理大小，其大小可以在1到数据大小之间。在每个纪元上，这些批次的数据都用于训练模型。

1次迭代处理1组数据。 1个时期处理整个批次的数据。 1个时期包含1个或多个迭代。

因此，如果数据大小为m，则每次迭代期间馈入的数据为：

BDG = m
SDG = 1
MDG = 1

MGD的基本培训步骤：

Prepare data
Initialize the model and its parameters (weights and biases)
for each epoch:  #(epoch)
    for each mini_batch: #(iteration)
        Forward Propagation
        Compute Cost
        Back Propagation
        Update Parameters

这是渐变下降，批处理，历元和迭代背后的理论概念。

现在进入Keras和您的代码：

我为您运行了Colab代码，它运行正常。在您发布的代码中，纪元数为1，对于模型而言，该数非常少，因为数据很少，并且模型本身非常简单。因此，到目前为止，您需要增加数据量 或创建更复杂的模型或培训以获取更多的时代从笔记本上。通过适当调整学习速度，可以减少时代数

learning_rate=0.14
epochs=70
my_batch_size= 32 

my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature, 
                                                        my_label, epochs,
                                                        my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)

如果学习率很小，模型将学习缓慢，因此需要更长的训练周期（epoch）才能进行更准确的预测。提高学习速度可以加快学习过程，因此可以减少时代。请比较colab中代码的不同部分以获取适当的示例。

关于获取每次迭代的指标：

Keras是TensorFlow的高级API。到目前为止，我知道（不考虑API的自定义），在Keras中进行训练时，它会在每次迭代结束时计算训练集的损失，错误和准确性，并在每个时期结束时返回各自的平均值。因此，如果存在 n 个纪元，那么无论这些度量之间有多少次迭代，每个指标都将具有 n 个数。

关于坡度和截距：

线性回归模型在输出层y = mx + c上使用线性激活函数。对于我们拥有的值

y-指输出
x-指输入
m-指（必须调整）的坡度
c-指截距（也可以调整）

在我们的模型中，这些 m 和 c 是我们要调整的。它们是模型的 weight 和 bias 。所以我们的功能看起来像 y = Wx + b，其中 b给出截距， w给出斜率。权重和偏差会在开始时随机初始化。

从零开始建立线性回归模型的Colab链接

请根据需要调整值。由于该模型是从头开始实施的，因此请收集或打印您想要在培训期间跟踪的任何值。您也可以使用自己的数据集，但请确保其有效或由某些库生成以用于模型验证（sklearn）。

https://colab.research.google.com/drive/1RfuRNMoVv-l6KyM_SegdJOHiXD_0xBHq?usp=sharing

P.S。如果发现任何令人困惑的事情，请发表评论。我很乐意答复。

在tf.keras中了解线性回归模型调整的问题

3 个答案:

基金会