Question

我正在阅读一篇文章，解释如何欺骗神经网络来预测你想要的任何图像。我正在使用mnist数据集。

本文提供了一个相对详细的演练，但编写它的人正在使用Caffe。

无论如何，我的第一步是使用在mnist数据集上训练的TensorFlow创建逻辑回归函数。所以，如果我要restore逻辑回归模型，我可以用它来预测任何图像。例如，我将 7号提供给以下型号......

with tf.Session() as sess:  
    saver.restore(sess, "/tmp/model.ckpt")
    # number 7
    x_in = np.expand_dims(mnist.test.images[0], axis=0)
    classification = sess.run(tf.argmax(pred, 1), feed_dict={x:x_in})
    print(classification) 

>>>[7]

这会打印出正确的数字[7]。

现在文章解释说，为了打破神经网络，我们需要计算神经网络的梯度。这是神经网络的衍生物。

文章指出，要计算梯度，我们首先需要选择一个预期的结果，然后将输出概率列表设置为0，并将预期结果设置为1。 反向传播是一种计算渐变的算法。

然后在Caffe中提供了关于如何计算渐变的代码......

def compute_gradient(image, intended_outcome):
    # Put the image into the network and make the prediction
    predict(image)
    # Get an empty set of probabilities
    probs = np.zeros_like(net.blobs['prob'].data)
    # Set the probability for our intended outcome to 1
    probs[0][intended_outcome] = 1
    # Do backpropagation to calculate the gradient for that outcome
    # and the image we put in
    gradient = net.backward(prob=probs)
    return gradient['data'].copy()

现在，我的问题是，我很难理解这个函数是如何通过仅将图像和概率提供给函数来获得渐变的。因为我不完全理解这段代码，所以我很难将这个逻辑翻译成TensorFlow。

我认为我对Caffe框架如何工作感到困惑，因为我以前从未见过/使用它。如果有人能够解释这个逻辑如何逐步发挥作用，那就太棒了。

我已经知道Backpropagation的基础知识，所以你可以假设我已经知道它是如何工作的。

以下是文章本身的链接...... https://codewords.recurse.com/issues/five/why-do-neural-networks-think-a-panda-is-a-vulture

Answer 1

我将向您展示如何在TF中生成对抗图像的基础知识，将其应用于已经学习过的模型，您可能需要进行一些调整。

如果您想以交互方式尝试此代码块，那么代码块可以很好地用作Jupyter笔记本中的块。如果您不使用笔记本，则需要为绘图添加plt.show（）调用以显示和删除matplotlib内联语句。代码基本上是TF文档中的简单MNIST教程，我将指出重要的区别。

第一块只是设置，没什么特别......

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# if you're not using jupyter notebooks then comment this out
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

获取MNIST数据（它会不时下载，因此您可能需要手动从web.archive.org下载并将其放入该目录中）。我们没有像教程中那样使用一个热编码，因为到目前为止，TF具有更好的函数来计算不再需要热编码的损失。

mnist = input_data.read_data_sets('/tmp/tensorflow/mnist/input_data')

在下一个街区，我们正在做一些“特别”的事情。输入图像张量被定义为变量，因为稍后我们想要针对输入图像进行优化。通常你会在这里有一个占位符。它确实限制了我们一点，因为我们需要一个确定的形状，所以我们一次只提供一个例子。不是你想要在制作中做的事情，但是出于教学目的它很好（你可以用更多的代码绕过它）。标签是正常的占位符。

input_images = tf.get_variable("input_image", shape=[1,784], dtype=tf.float32)
input_labels = tf.placeholder(shape=[1], name='input_label', dtype=tf.int32)

我们的模型是教程中的标准逻辑回归模型。我们只使用softmax来显示结果，损失函数采用简单的logits。

W = tf.get_variable("weights", shape=[784, 10], dtype=tf.float32, initializer=tf.random_normal_initializer())
b = tf.get_variable("biases", shape=[1, 10], dtype=tf.float32, initializer=tf.zeros_initializer())

logits = tf.matmul(input_images, W) + b
softmax = tf.nn.softmax(logits)

损失是标准的交叉熵。在训练步骤中需要注意的是，传入的变量列表是明确的 - 我们已将输入图像定义为训练变量，但我们不希望在训练逻辑回归时尝试优化图像，只需要权重和偏差 - 所以我们明确说明了这一点。

loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=input_labels,name='xentropy')

mean_loss = tf.reduce_mean(loss)

train_step = tf.train.AdamOptimizer(learning_rate=0.1).minimize(mean_loss, var_list=[W,b])

开始会话......

sess = tf.Session()
sess.run(tf.global_variables_initializer())

由于批量大小1，培训速度比应有的慢。就像我说的那样，不是你想要在制作中做的事情，但这仅仅是为了教授基础......

for step in range(10000):
    batch_xs, batch_ys = mnist.train.next_batch(1)
    loss_v, _ = sess.run([mean_loss, train_step], feed_dict={input_images: batch_xs, input_labels: batch_ys})

此时我们应该有一个足以证明如何生成对抗性图像的模型。首先，我们得到一个标签为'2'的图像，因为它们很容易，所以即使我们的次优分类器也应该正确（如果没有，再次运行这个单元格），这一步是随机的，所以我不能保证它会工作的。

我们将输入图像变量设置为该示例。

sample_label = -1
while sample_label != 2:
    sample_image, sample_label = mnist.test.next_batch(1)
    sample_label
plt.imshow(sample_image.reshape(28, 28),cmap='gray')

# assign image to var
sess.run(tf.assign(input_images, sample_image));
sess.run(softmax) # now using the variable as input, no feed dict

# should show something like
# array([[ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)
# With the third entry being the highest by far.

现在我们将“打破”分类。我们希望在不改变网络本身的情况下，更改图像，使其看起来更像是网络中的另一个数字。为此，代码看起来与我们之前的代码基本相同。我们定义了一个“假”标签，与之前相同的损失（交叉熵），我们得到一个优化器来最小化假损失，但这次使用var_list只包含输入图像 - 所以我们不会改变逻辑回归权重：

fake_label = tf.placeholder(tf.int32, shape=[1])
fake_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=fake_label)
adversarial_step = tf.train.GradientDescentOptimizer(learning_rate=1e-3).minimize(fake_loss, var_list=[input_images])

下一个版块旨在以交互方式多次运行，同时您会看到图像和分数发生变化（此处朝着8的标签移动）：

sess.run(adversarial_step, feed_dict={fake_label:np.array([8])})
plt.imshow(sess.run(input_images).reshape(28,28),cmap='gray')
sess.run(softmax)

第一次运行此块时，分数可能仍然会大幅指向2，但它会随着时间的推移而改变，经过几次运行后，您应该会看到如下图所示 - 请注意图像看起来仍然像2在背景中有一些噪音，但“2”的得分约为3％，而“8”的得分则超过96％。

请注意，我们实际上从未实际计算过梯度 - 我们不需要，TF优化器负责计算渐变并对变量应用更新。如果你想获得渐变，可以使用tf.gradients（fake_loss，input_images）来实现。

相同的模式适用于更复杂的模型，但您要做的是正常训练您的模型 - 使用更大批次的占位符，或使用带有TF读取器的管道，以及当您想要进行对抗时使用输入图像变量作为输入重新创建网络的图像。只要所有变量名称保持不变（如果使用相同的功能来构建网络，它们就应该这样），您可以使用网络检查点进行恢复，然后应用此帖子中的步骤来获取对抗图像。你可能需要学习学习率等等。

TensorFlow

1 个答案: