我正在尝试建立一个网络,其输入会持续/衰减。原始输入将是一个向量,每个元素的输入为0,1或-1。我很好奇在同时激活任何给定输入中是否有任何值,因此我想将权重从1或-1衰减回0,而不是在下一次迭代时将其减小为0,我想这是一种粗略的内存形式。我想说的一个例子:
Normal input:
1 -> 0 -> 0 -> -1 -> 0 ...
With decay .2:
1 -> .8 -> .6 -> -1 -> -.8 ...
通过添加一个带有衰减值向量的额外输入,可以很容易地手动完成此操作,但是我想知道是否有可能让网络在此处学习其自身的值,以便可以为输入的衰减量较小。更重要。
由于每个神经元输出一个值,因此可能有N个神经元(每个所需的衰减值一个),然后将它们作为恒定输入传递给它们1,这样它们就可以输出其权重,可以通过S型激活来进行使用作为衰减值。
在输入始终为1的情况下,该层是否将学习权重?如果没有,有办法吗?
注意: 数据是连续的,这就是为什么我认为激活会相互影响的原因。我也知道循环网络具有内存,但是我不知道我是否有足够的数据来学习关系。同样,这个自定义衰减函数最终可以使它回到0,因为它减去了衰减,乘以较小的权重将渐近地接近0,如果我正确理解的话,这就是RNN的作用。
答案 0 :(得分:0)
您可以使用TensorFlow功能API轻松创建此类架构。
创建数据集和模型 代码:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Generating features
np.random.seed(100)
x1 = tf.constant(np.ones(shape =(100,1)), dtype = tf.float32)
x2 = tf.constant(np.ones(shape =(100,1)), dtype = tf.float32)
x3 = tf.constant(np.ones(shape =(100,1)), dtype = tf.float32)
y = tf.constant(np.random.randint(2, size =(100,)), dtype = tf.float32)
def create_model():
input1 = tf.keras.Input(shape=(1,))
input2 = tf.keras.Input(shape=(1,))
input3 = tf.keras.Input(shape=(1,))
hidden1 = tf.keras.layers.Dense(units = 1, activation='sigmoid', use_bias = False)(input1)
hidden2 = tf.keras.layers.Dense(units = 1, activation='sigmoid', use_bias = False)(input2)
hidden3 = tf.keras.layers.Dense(units = 1, activation='sigmoid', use_bias = False)(input3)
merge = tf.keras.layers.concatenate([hidden1,hidden2,hidden3])
hidden4 = tf.keras.layers.Dense(units = 4, activation='sigmoid')(merge)
output1 = tf.keras.layers.Dense(units = 2, activation='softmax')(hidden4)
model = tf.keras.models.Model(inputs = [input1, input2, input3], outputs = output1, name= "functional1")
return model
model = create_model()
# setting decay values
model.layers[3].set_weights([tf.constant([[0.8]])])
model.layers[4].set_weights([tf.constant([[0.8]])])
model.layers[5].set_weights([tf.constant([[0.8]])])
tf.keras.utils.plot_model(model, 'my_first_model.png', show_shapes=True)
培训过程:
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=10)
# Instantiate a loss function.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
epochs = 50
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model([x1,x2,x3], training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
print('Gradients of- Decay 1: {} Decay 2: {} Decay 3: {}'.format(grads[0].numpy()[0][0], grads[1].numpy()[0][0], grads[2].numpy()[0][0]))
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every epochs.
print("Training loss (for one batch) at epoch %d: %.4f" % (epoch, float(loss_value)))
print('------------------------------')
输出:
Start of epoch 0
Gradients of- Decay 1: -0.001539231976494193 Decay 2: 0.0013862588675692677 Decay 3: -0.0024916294496506453
Training loss (for one batch) at epoch 0: 0.7312
------------------------------
Start of epoch 1
Gradients of- Decay 1: 0.0015823811991140246 Decay 2: -0.00021153852867428213 Decay 3: 0.0008941243286244571
Training loss (for one batch) at epoch 1: 0.7042
------------------------------
Start of epoch 2
Gradients of- Decay 1: -0.0013041968923062086 Decay 2: 0.0005898184608668089 Decay 3: -0.0015725962584838271
Training loss (for one batch) at epoch 2: 0.7039
------------------------------
Start of epoch 3
Gradients of- Decay 1: 0.00156548956874758 Decay 2: -0.00017016787023749202 Decay 3: 0.000881993502844125
Training loss (for one batch) at epoch 3: 0.7045
------------------------------
Start of epoch 4
Gradients of- Decay 1: -0.0012605276424437761 Decay 2: 0.00047704551252536476 Decay 3: -0.0015090997330844402
Training loss (for one batch) at epoch 4: 0.7028
------------------------------
Start of epoch 5
Gradients of- Decay 1: 0.0014193064998835325 Decay 2: -0.0001368212979286909 Decay 3: 0.0008420557714998722
Training loss (for one batch) at epoch 5: 0.7027
------------------------------
Start of epoch 6
Gradients of- Decay 1: -0.0011729025281965733 Decay 2: 0.0003637363843154162 Decay 3: -0.0013745202450081706
Training loss (for one batch) at epoch 6: 0.7011
------------------------------
Start of epoch 7
Gradients of- Decay 1: 0.0012617181055247784 Decay 2: -0.00010974107135552913 Decay 3: 0.0007924885721877217
Training loss (for one batch) at epoch 7: 0.7007
------------------------------
Start of epoch 8
Gradients of- Decay 1: -0.0010727590415626764 Decay 2: 0.000274341378826648 Decay 3: -0.0012277730274945498
Training loss (for one batch) at epoch 8: 0.6995
------------------------------
Start of epoch 9
Gradients of- Decay 1: 0.0011162457522004843 Decay 2: -8.809947757981718e-05 Decay 3: 0.0007380791357718408
Training loss (for one batch) at epoch 9: 0.6991
------------------------------
Start of epoch 10
Gradients of- Decay 1: -0.0009710552403703332 Decay 2: 0.00020754436263814569 Decay 3: -0.001086110481992364
Training loss (for one batch) at epoch 10: 0.6982
------------------------------
衰减率的最终值。
print(model.layers[3].get_weights())
print(model.layers[4].get_weights())
print(model.layers[5].get_weights())
输出:
[array([[0.7963085]], dtype=float32)]
[array([[0.7707753]], dtype=float32)]
[array([[0.8614942]], dtype=float32)]
要记住的事情-
您的学习不仅取决于您的输入,而且还取决于您的输出。在计算显示在上方的梯度时,梯度方程中会显示输出以及预测的输出项。因此,只要您有不同的输出,学习仍然会发生。