我可以使用阶跃函数作为损失函数来训练神经网络吗?

时间:2020-01-31 07:53:10

标签: python function tensorflow keras loss-function

作为标题,

我尝试建立预测PM2.5的模型,

可以使用具有梯度的体面的损失函数,例如mse,rmse,mae等。

但是当我将自定义损失函数与步进函数一起使用时,权重似乎没有更新。

在我模型的最后一层中,输出pm2.5预测,

我尝试使用阶跃函数计算损失

def custom_loss(y_true,y_pred):
  z_true = step_function(y_true)
  z_pred = step_function(y_pred)
  return K.abs(z_true -z_pred)

我的步进功能是尝试将PM2.5转换为AQI级别。

def step_function(x):
  step1 = ((K.tanh(x-15.45))+1)/2  # is means PM2.5 <15.45 return 0 >15.45 return 1 
  step2 = ((K.tanh(x-35.45))+1)/2  # is means PM2.5 <35.45 return 0 >35.45 return 1 
  return (step1+step2)  # if x(PM2.5) = 50 , will return 2
当y_true和y_pred等于0且步进函数返回0时,

可能 发生重量没有更新?

1 个答案:

答案 0 :(得分:1)

正如您已经正确提到的,您必须在Loss为0时处理Loss,否则优化器无法将其最小化。因此,模型的权重也不会更新。因此,在这种情况下,理想的方法是使用自定义训练在training loss级别跟踪step

通过自定义培训,您将拥有更多控制权。如果您希望培训和评估循环的级别比fit()evaluate()所提供的低,则应编写自己的培训循环。实际上很简单。但是您应该准备自己进行更多的调试。

GradientTape范围内调用模型可以使您检索层的可训练权重相对于损耗值的梯度。使用优化程序实例,您可以使用这些渐变来更新这些变量(可以使用model.trainable_weights进行检索)。

TensorFlow提供了tf.GradientTape API来进行自动微分-计算计算相对于其输入变量的梯度。 Tensorflow将在tf.GradientTape上下文内执行的所有操作“记录”到“ tape”上。然后,Tensorflow使用该磁带和与每个记录的操作关联的梯度来使用反向模式微分来计算“记录的”计算的梯度。

如果要在应用渐变之前对其进行处理,则可以分三步使用优化器:

  1. 使用tf.GradientTape计算梯度。
  2. 根据需要处理渐变。
  3. 通过apply_gradients()应用处理后的渐变。

这是简单数据示例。代码中存在注释以更好地解释。

代码-

import tensorflow as tf
print(tf.__version__)
from tensorflow import keras
from tensorflow.keras import layers

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

# Get the model.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

epochs = 3
for epoch in range(epochs):
  print('Start of epoch %d' % (epoch,))

  # Iterate over the batches of the dataset.
  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

    # Open a GradientTape to record the operations run
    # during the forward pass, which enables autodifferentiation.
    with tf.GradientTape() as tape:

      # Run the forward pass of the layer.
      # The operations that the layer applies
      # to its inputs are going to be recorded
      # on the GradientTape.
      logits = model(x_batch_train, training=True)  # Logits for this minibatch

      # Compute the loss value for this minibatch.
      loss_value = loss_fn(y_batch_train, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(loss_value, model.trainable_weights)

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

    # Log every 200 batches.
    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

输出-

2.2.0
Start of epoch 0
Training loss (for one batch) at step 0: 2.323657512664795
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.3156163692474365
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.2302279472351074
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.131979465484619
Seen so far: 38464 samples
Start of epoch 1
Training loss (for one batch) at step 0: 2.00234317779541
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.7992427349090576
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.8583933115005493
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.6005337238311768
Seen so far: 38464 samples
Start of epoch 2
Training loss (for one batch) at step 0: 1.6701987981796265
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.6237502098083496
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.3603084087371826
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.246948480606079
Seen so far: 38464 samples

您可以找到有关tf.GradientTape here的更多信息。这里使用的示例摘自here

希望这能回答您的问题。学习愉快。