我正在尝试创建一个自定义损失函数,该函数依赖于模型输出的logits作为优化程序最小化任务的中间步骤。但是,我不断收到一条错误消息,提示我尝试优化的输入perturbation
与损失函数之间的某些操作不支持渐变:
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'perturbation:0' shape=(1, 28, 28, 1) dtype=float32, numpy=\narray([[[[0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.]],\n\n [[0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.]],\n\n [[0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],\n [0.],...
我高度怀疑这是由于调用model.predict()
所致,因为它以numpy
的形式返回,而不是可用来跟踪渐变的张量。如果这是真正的罪魁祸首,如何通过该函数跟踪梯度,以便可以适当地最小化损失函数?
我尝试将output
包装在tf.Variable()
(output = tf.Variable(model.predict(newimg, steps=num_steps), dtype=tf.float32, trainable=False, name="output")
)中,以查看是否有帮助,但没有帮助。
train_temperature = 1
# create model layers
model = k.models.Sequential([
# flatten into a single vector
k.layers.Flatten(input_shape=(28, 28, 1), name='input'),
# first layer
k.layers.Dense(512, activation=tf.nn.relu, name='dense_1'),
k.layers.Dropout(0.2, name='dropout_1'),
# second layer
k.layers.Dense(256, activation=tf.nn.relu, name='dense_2'),
k.layers.Dropout(0.2, name='dropout_2'),
# third layer
k.layers.Dense(128, activation=tf.nn.relu, name='dense_3'),
k.layers.Dropout(0.2, name='dropout_3'),
# fourth layer
k.layers.Dense(64, activation=tf.nn.relu, name='dense_4'),
k.layers.Dropout(0.2, name='dropout_4'),
# fifth layer
k.layers.Dense(20, activation=tf.nn.relu, name='dense_5'),
k.layers.Dropout(0.2, name='dropout_5'),
# sixth layer
k.layers.Dense(10, name='dense_6')
])
def fn(correct, predicted):
return tf.nn.softmax_cross_entropy_with_logits_v2(labels=correct, logits=predicted/train_temperature)
# compile with optimizer, loss, and metrics
model.compile(optimizer='adam',
loss=fn,
metrics=['accuracy'])
# fit
model.fit(x_train, y_train, epochs=2, shuffle=True)
# run
model.evaluate(x_test, y_test)
# the variable to optimize
perturbation = tf.Variable(np.zeros(input_shape, dtype=np.float32), trainable=True, name="perturbation")
# other variables
img = tf.Variable(np.zeros(input_shape), dtype=tf.float32, trainable=False, name="img")
label = tf.Variable(np.zeros((batch_size, num_labels)), dtype=tf.float32, trainable=False, name="label")
# the resulting adversarial image, tanh'd to keep bounded from boxmin to boxmax
boxmul = (boxmax - boxmin) / 2.
boxplus = (boxmin + boxmax) / 2.
newimg = tf.tanh(img + perturbation) * boxmul + boxplus
# prediction BEFORE-SOFTMAX of the model
output = model.predict(newimg, steps=num_steps)
# compute the probability of the label class versus the maximum other
real = tf.reduce_sum((label)*output, 1)
other = tf.reduce_max((1-label)*output - (label*10000), 1)
# compute loss
def loss():
loss1 = tf.maximum(0.0, real - other)
loss2 = 0 # TODO: add the KL-Divergence regularizer term here
loss = loss1 + reg * loss2
return loss
# setup the adam optimizer
optimizer = tf.train.AdamOptimizer(step_size)
train = optimizer.minimize(loss, var_list=[perturbation])