Question

我正在研究cleverhans python库。

我有一张图片：

faces1.shape

> (1, 160, 160, 3)

我想使用这种FGSM方法来扰乱此图像中的像素：

with tf.Graph().as_default():
    with tf.Session() as sess:
        # Load model
        model = InceptionResnetV1Model()
        # Convert to classifier
        model.convert_to_classifier()

        # Load pairs of faces and their labels in one-hot encoding
        faces1, faces2, labels = set_loader.load_testset(1)

        # Create victims' embeddings using Facenet itself
        graph = tf.get_default_graph()
        phase_train_placeholder = graph.get_tensor_by_name("phase_train:0")
        feed_dict = {model.face_input: faces2,
                     phase_train_placeholder: False}
        victims_embeddings = sess.run(
            model.embedding_output, feed_dict=feed_dict)

        # Define FGSM for the model
        steps = 1
        eps = 0.01
        alpha = eps / steps
        fgsm = FastGradientMethod(model)
        fgsm_params = {'eps': alpha,
                       'clip_min': 0.,
                       'clip_max': 1.}
        adv_x = fgsm.generate(x = model.face_input, **fgsm_params)

        # Run FGSM
        adv = faces1
        for i in range(steps):
            print("FGSM step " + str(i + 1))
            feed_dict = {model.face_input: adv,
                         model.victim_embedding_input: victims_embeddings,
                         phase_train_placeholder: False}
            adv = sess.run(adv_x, feed_dict=feed_dict)
        plt.imshow((np.squeeze(adv) * 255).round().astype(np.uint8))
        plt.show()

这将返回一幅图像，该图像会受到干扰，并且会欺骗初始检测模型。 但是，我只想干扰图像的一小部分并获得相同的效果

这是他们用来执行扰动的模型：

def fgm(x, preds, y=None, eps=0.3, ord=np.inf,
        clip_min=None, clip_max=None,
        targeted=False):
    """
    TensorFlow implementation of the Fast Gradient Method.
    :param x: the input placeholder
    :param preds: the model's output tensor (the attack expects the
                  probabilities, i.e., the output of the softmax)
    :param y: (optional) A placeholder for the model labels. If targeted
              is true, then provide the target label. Otherwise, only provide
              this parameter if you'd like to use true labels when crafting
              adversarial samples. Otherwise, model predictions are used as
              labels to avoid the "label leaking" effect (explained in this
              paper: https://arxiv.org/abs/1611.01236). Default is None.
              Labels should be one-hot-encoded.
    :param eps: the epsilon (input variation parameter)
    :param ord: (optional) Order of the norm (mimics NumPy).
                Possible values: np.inf, 1 or 2.
    :param clip_min: Minimum float value for adversarial example components
    :param clip_max: Maximum float value for adversarial example components
    :param targeted: Is the attack targeted or untargeted? Untargeted, the
                     default, will try to make the label incorrect. Targeted
                     will instead try to move in the direction of being more
                     like y.
    :return: a tensor for the adversarial example
    """

    if y is None:
        # Using model predictions as ground truth to avoid label leaking
        preds_max = tf.reduce_max(preds, 1, keep_dims=True)
        y = tf.to_float(tf.equal(preds, preds_max))
        y = tf.stop_gradient(y)
    y = y / tf.reduce_sum(y, 1, keep_dims=True)

    # Compute loss
    loss = utils_tf.model_loss(y, preds, mean=False)
    if targeted:
        loss = -loss

    # Define gradient of loss wrt input
    grad, = tf.gradients(loss, x)

    if ord == np.inf:
        # Take sign of gradient
        normalized_grad = tf.sign(grad)
        # The following line should not change the numerical results.
        # It applies only because `normalized_grad` is the output of
        # a `sign` op, which has zero derivative anyway.
        # It should not be applied for the other norms, where the
        # perturbation has a non-zero derivative.
        normalized_grad = tf.stop_gradient(normalized_grad)
    elif ord == 1:
        red_ind = list(xrange(1, len(x.get_shape())))
        normalized_grad = grad / tf.reduce_sum(tf.abs(grad),
                                               reduction_indices=red_ind,
                                               keep_dims=True)
    elif ord == 2:
        red_ind = list(xrange(1, len(x.get_shape())))
        square = tf.reduce_sum(tf.square(grad),
                               reduction_indices=red_ind,
                               keep_dims=True)
        normalized_grad = grad / tf.sqrt(square)
    else:
        raise NotImplementedError("Only L-inf, L1 and L2 norms are "
                                  "currently implemented.")

    # Multiply by constant epsilon
    scaled_grad = eps * normalized_grad

    # Add perturbation to original example to obtain adversarial example
    adv_x = x + scaled_grad

    # If clipping is needed, reset all values outside of [clip_min, clip_max]
    if (clip_min is not None) and (clip_max is not None):
        adv_x = tf.clip_by_value(adv_x, clip_min, clip_max)

    return adv_x

如您所见，此处的关键行是adv_x = x + scaled_grad，其中x是输入图像，并且受到scaled_grad的干扰。

print(scaled_grad)

> Tensor("mul_5:0", shape=(?, 160, 160, 3), dtype=float32)

和

print(x)

> Tensor("input:0", shape=(?, 160, 160, 3), dtype=float32)

所以我想做的是在获取渐变时提取x的一部分，以便渐变与x的一部分具有相同的形状，那么我想仅扰动x的那部分，然后我想将x的其余部分重新附加到扰动的部分，以便它输出与以前一样的图像，只有定义的部分受到扰动，其余部分不是。

Answer 1

大多数CleverHans中的攻击都不允许用户指定应修改输入的哪些功能。但是，可以使某些现有的攻击适应这种行为。

例如，在JSMA攻击中，您可以在计算显着性图时掩盖某些功能，以使攻击在构造输入扰动时仅考虑特征的特定子集（在您的情况下为像素的特定子集））。您必须在jsma_symbolic中的cleverhans/attacks_tf.py函数中修改以下代码段：

# Create a mask to only keep features that match conditions
if increase:
  scores_mask = ((target_sum > 0) & (other_sum < 0))
else:
  scores_mask = ((target_sum < 0) & (other_sum > 0))

您还可以在Brown等人的对抗补丁工作中找到这种行为的示例。他们在其中限制了添加到图像上的扰动，以创建一个对抗示例，使其成为图像中的一个小块：https://github.com/tensorflow/cleverhans/tree/master/examples/adversarial_patch

在Tensorflow Python中仅扰动一小部分张量图像

1 个答案: