Question

我正在尝试建立一个网络，其输入会持续/衰减。原始输入将是一个向量，每个元素的输入为0,1或-1。我很好奇在同时激活任何给定输入中是否有任何值，因此我想将权重从1或-1衰减回0，而不是在下一次迭代时将其减小为0，我想这是一种粗略的内存形式。我想说的一个例子：

Normal input:
1 -> 0 -> 0 -> -1 -> 0 ...
With decay .2:
1 -> .8 -> .6 -> -1 -> -.8 ...

通过添加一个带有衰减值向量的额外输入，可以很容易地手动完成此操作，但是我想知道是否有可能让网络在此处学习其自身的值，以便可以为输入的衰减量较小。更重要。

由于每个神经元输出一个值，因此可能有N个神经元（每个所需的衰减值一个），然后将它们作为恒定输入传递给它们1，这样它们就可以输出其权重，可以通过S型激活来进行使用作为衰减值。

在输入始终为1的情况下，该层是否将学习权重？如果没有，有办法吗？

注意：数据是连续的，这就是为什么我认为激活会相互影响的原因。我也知道循环网络具有内存，但是我不知道我是否有足够的数据来学习关系。同样，这个自定义衰减函数最终可以使它回到0，因为它减去了衰减，乘以较小的权重将渐近地接近0，如果我正确理解的话，这就是RNN的作用。

Answer 1

您可以使用TensorFlow功能API轻松创建此类架构。

创建数据集和模型代码：

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Generating features
np.random.seed(100)
x1 = tf.constant(np.ones(shape =(100,1)), dtype = tf.float32)
x2 = tf.constant(np.ones(shape =(100,1)), dtype = tf.float32)
x3 = tf.constant(np.ones(shape =(100,1)), dtype = tf.float32)
y = tf.constant(np.random.randint(2, size =(100,)), dtype = tf.float32)

def create_model():
    input1 = tf.keras.Input(shape=(1,))
    input2 = tf.keras.Input(shape=(1,))
    input3 = tf.keras.Input(shape=(1,))
    hidden1 = tf.keras.layers.Dense(units = 1, activation='sigmoid', use_bias = False)(input1)
    hidden2 = tf.keras.layers.Dense(units = 1, activation='sigmoid', use_bias = False)(input2)
    hidden3 = tf.keras.layers.Dense(units = 1, activation='sigmoid', use_bias = False)(input3)
    
    merge = tf.keras.layers.concatenate([hidden1,hidden2,hidden3])
    
    hidden4 = tf.keras.layers.Dense(units = 4, activation='sigmoid')(merge)
    output1 = tf.keras.layers.Dense(units = 2, activation='softmax')(hidden4)
    
    model = tf.keras.models.Model(inputs = [input1, input2, input3], outputs = output1, name= "functional1")
    
    return model
model = create_model()

# setting decay values
model.layers[3].set_weights([tf.constant([[0.8]])])
model.layers[4].set_weights([tf.constant([[0.8]])])
model.layers[5].set_weights([tf.constant([[0.8]])])

tf.keras.utils.plot_model(model, 'my_first_model.png', show_shapes=True)

您的模型如下所示。

培训过程：

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=10)
# Instantiate a loss function.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
epochs = 50
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))


    # Open a GradientTape to record the operations run
    # during the forward pass, which enables auto-differentiation.
    with tf.GradientTape() as tape:

        # Run the forward pass of the layer.
        # The operations that the layer applies
        # to its inputs are going to be recorded
        # on the GradientTape.
        logits = model([x1,x2,x3], training=True)  # Logits for this minibatch

        # Compute the loss value for this minibatch.
        loss_value = loss_fn(y, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(loss_value, model.trainable_weights)
    print('Gradients of- Decay 1: {}  Decay 2: {}  Decay 3: {}'.format(grads[0].numpy()[0][0], grads[1].numpy()[0][0], grads[2].numpy()[0][0]))

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

    # Log every epochs.
    print("Training loss (for one batch) at epoch %d: %.4f" % (epoch, float(loss_value)))
    print('------------------------------')

输出：

Start of epoch 0
Gradients of- Decay 1: -0.001539231976494193  Decay 2: 0.0013862588675692677  Decay 3: -0.0024916294496506453
Training loss (for one batch) at epoch 0: 0.7312
------------------------------

Start of epoch 1
Gradients of- Decay 1: 0.0015823811991140246  Decay 2: -0.00021153852867428213  Decay 3: 0.0008941243286244571
Training loss (for one batch) at epoch 1: 0.7042
------------------------------

Start of epoch 2
Gradients of- Decay 1: -0.0013041968923062086  Decay 2: 0.0005898184608668089  Decay 3: -0.0015725962584838271
Training loss (for one batch) at epoch 2: 0.7039
------------------------------

Start of epoch 3
Gradients of- Decay 1: 0.00156548956874758  Decay 2: -0.00017016787023749202  Decay 3: 0.000881993502844125
Training loss (for one batch) at epoch 3: 0.7045
------------------------------

Start of epoch 4
Gradients of- Decay 1: -0.0012605276424437761  Decay 2: 0.00047704551252536476  Decay 3: -0.0015090997330844402
Training loss (for one batch) at epoch 4: 0.7028
------------------------------

Start of epoch 5
Gradients of- Decay 1: 0.0014193064998835325  Decay 2: -0.0001368212979286909  Decay 3: 0.0008420557714998722
Training loss (for one batch) at epoch 5: 0.7027
------------------------------

Start of epoch 6
Gradients of- Decay 1: -0.0011729025281965733  Decay 2: 0.0003637363843154162  Decay 3: -0.0013745202450081706
Training loss (for one batch) at epoch 6: 0.7011
------------------------------

Start of epoch 7
Gradients of- Decay 1: 0.0012617181055247784  Decay 2: -0.00010974107135552913  Decay 3: 0.0007924885721877217
Training loss (for one batch) at epoch 7: 0.7007
------------------------------

Start of epoch 8
Gradients of- Decay 1: -0.0010727590415626764  Decay 2: 0.000274341378826648  Decay 3: -0.0012277730274945498
Training loss (for one batch) at epoch 8: 0.6995
------------------------------

Start of epoch 9
Gradients of- Decay 1: 0.0011162457522004843  Decay 2: -8.809947757981718e-05  Decay 3: 0.0007380791357718408
Training loss (for one batch) at epoch 9: 0.6991
------------------------------

Start of epoch 10
Gradients of- Decay 1: -0.0009710552403703332  Decay 2: 0.00020754436263814569  Decay 3: -0.001086110481992364
Training loss (for one batch) at epoch 10: 0.6982
------------------------------

衰减率的最终值。

print(model.layers[3].get_weights())
print(model.layers[4].get_weights())
print(model.layers[5].get_weights())

输出：

[array([[0.7963085]], dtype=float32)]
[array([[0.7707753]], dtype=float32)]
[array([[0.8614942]], dtype=float32)]

要记住的事情-

您的学习不仅取决于您的输入，而且还取决于您的输出。在计算显示在上方的梯度时，梯度方程中会显示输出以及预测的输出项。因此，只要您有不同的输出，学习仍然会发生。

输入不变的神经网络层会学习权重吗？

1 个答案: