如果我对图层的了解是正确的,则Layers
使用tf.Variable
作为权重变量,因此,如果Dense()
图层中包含3个单位,则表示它正在使用类似{{ 1}}(对于单个实例),如果w = tf.Variable([0.2,5,0.9])
是2,那么变量会像input_shape
一样吗?
如果我错了,请纠正我。
我正在学习tensorflow的非常深入的基础知识,并找到了我修改为的代码
w = tf.Variable([[0.2,5,0.9],[2,3,0.4]])
有人可以告诉我weight = tf.Variable([3.2])
def get_lost_loss(w):
'''
A very hypothetical function since the name
'''
return (w**1.3)/3.1 # just felt like doing it
def calculate_gradient(w):
with tf.GradientTape() as tape:
loss = get_lost_loss(w) # calculate loss WITHIN tf.GradientTape()
grad = tape.gradient(loss,w) # gradient of loss wrt. w
return grad
# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)
losses = []
for i in range(50):
grad = calculate_gradient(weight)
opt.apply_gradients(zip([grad],[weight]))
losses.append(get_lost_loss(weight))
内部发生的事情吗?另外,我最想问的是如果我必须对形状为tf.GradientTape()
而不是weight1
的{{1}}和weight2
进行此操作,应该怎么做是对代码的修改
请做出任何假设。你们在这方面比我要熟练得多。
答案 0 :(得分:1)
是的,您是正确的。图层有两个变量。您提到的一个称为内核。另一个叫做偏见。以下示例对其进行了详细说明:
import tensorflow as tf
w=tf.Variable([[3.2,5,6,7,5]],dtype=tf.float32)
d=tf.keras.layers.Dense(3,input_shape=(5,)) # Layer d gets inputs with shape (*,5) and generates outputs with shape (*,3)
# It has kernel variable with shape (5,3) and bias variable with shape (3)
print("Output of applying d on w:", d(w))
print("\nLayer d trainable variables:\n", d.trainable_weights)
输出将类似于:
Output of applying d on w: tf.Tensor([[ -0.9845681 -10.321521 7.506028 ]], shape=(1, 3), dtype=float32)
Layer d trainable variables:
[<tf.Variable 'dense_18/kernel:0' shape=(5, 3) dtype=float32, numpy=
array([[-0.8144073 , -0.8408185 , -0.2504158 ],
[ 0.6073988 , 0.09965736, -0.32579994],
[ 0.04219657, -0.33530533, 0.71029276],
[ 0.33406 , -0.673926 , 0.77048916],
[-0.8014116 , -0.27997494, 0.05623555]], dtype=float32)>, <tf.Variable 'dense_18/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]
tf.GradientTape()用于在其上下文中记录可训练权重(变量)上的操作,以进行自动区分。因此,稍后我们可以获取变量的导数。
假设您有两个权重变量,分别为weight1和weight2。首先,您需要更改损失函数以使用两个变量(请参见下面的代码)。然后,在每个步骤中,您都需要获得损失函数wrt的导数。变量并更新它们以优化损失。请参见下面的代码。
import tensorflow as tf
weight1 = tf.Variable([[3.2,5,6],[2,5,4]],dtype=tf.float32) #modified
weight2= tf.Variable([[1,2,3],[1,4,3]],dtype=tf.float32) #modified
def get_lost_loss(w1, w2): #modified
'''
A very hypothetical function since the name
'''
return tf.reduce_sum(tf.math.add(w1**1.2/2,w2**2)) # just felt like doing it
def calculate_gradient(w1,w2):
with tf.GradientTape() as tape:
loss = get_lost_loss(w1,w2) # calculate loss WITHIN tf.GradientTape()
dw1,dw2 = tape.gradient(loss,[w1,w2]) # gradient of loss wrt. w1,w2
return dw1,dw2
# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)
losses = []
for i in range(500):
grad_weight1, grad_weight2 = calculate_gradient(weight1,weight2)
opt.apply_gradients(zip([grad_weight1, grad_weight2],[weight1,weight2]))
losses.append(get_lost_loss(weight1,weight2))
print("loss: "+str(get_lost_loss(weight1,weight2).numpy()))