我正在研究cleverhans python库。
我有一张图片:
faces1.shape
> (1, 160, 160, 3)
我想使用这种FGSM方法来扰乱此图像中的像素:
with tf.Graph().as_default():
with tf.Session() as sess:
# Load model
model = InceptionResnetV1Model()
# Convert to classifier
model.convert_to_classifier()
# Load pairs of faces and their labels in one-hot encoding
faces1, faces2, labels = set_loader.load_testset(1)
# Create victims' embeddings using Facenet itself
graph = tf.get_default_graph()
phase_train_placeholder = graph.get_tensor_by_name("phase_train:0")
feed_dict = {model.face_input: faces2,
phase_train_placeholder: False}
victims_embeddings = sess.run(
model.embedding_output, feed_dict=feed_dict)
# Define FGSM for the model
steps = 1
eps = 0.01
alpha = eps / steps
fgsm = FastGradientMethod(model)
fgsm_params = {'eps': alpha,
'clip_min': 0.,
'clip_max': 1.}
adv_x = fgsm.generate(x = model.face_input, **fgsm_params)
# Run FGSM
adv = faces1
for i in range(steps):
print("FGSM step " + str(i + 1))
feed_dict = {model.face_input: adv,
model.victim_embedding_input: victims_embeddings,
phase_train_placeholder: False}
adv = sess.run(adv_x, feed_dict=feed_dict)
plt.imshow((np.squeeze(adv) * 255).round().astype(np.uint8))
plt.show()
这将返回一幅图像,该图像会受到干扰,并且会欺骗初始检测模型。 但是,我只想干扰图像的一小部分并获得相同的效果
这是他们用来执行扰动的模型:
def fgm(x, preds, y=None, eps=0.3, ord=np.inf,
clip_min=None, clip_max=None,
targeted=False):
"""
TensorFlow implementation of the Fast Gradient Method.
:param x: the input placeholder
:param preds: the model's output tensor (the attack expects the
probabilities, i.e., the output of the softmax)
:param y: (optional) A placeholder for the model labels. If targeted
is true, then provide the target label. Otherwise, only provide
this parameter if you'd like to use true labels when crafting
adversarial samples. Otherwise, model predictions are used as
labels to avoid the "label leaking" effect (explained in this
paper: https://arxiv.org/abs/1611.01236). Default is None.
Labels should be one-hot-encoded.
:param eps: the epsilon (input variation parameter)
:param ord: (optional) Order of the norm (mimics NumPy).
Possible values: np.inf, 1 or 2.
:param clip_min: Minimum float value for adversarial example components
:param clip_max: Maximum float value for adversarial example components
:param targeted: Is the attack targeted or untargeted? Untargeted, the
default, will try to make the label incorrect. Targeted
will instead try to move in the direction of being more
like y.
:return: a tensor for the adversarial example
"""
if y is None:
# Using model predictions as ground truth to avoid label leaking
preds_max = tf.reduce_max(preds, 1, keep_dims=True)
y = tf.to_float(tf.equal(preds, preds_max))
y = tf.stop_gradient(y)
y = y / tf.reduce_sum(y, 1, keep_dims=True)
# Compute loss
loss = utils_tf.model_loss(y, preds, mean=False)
if targeted:
loss = -loss
# Define gradient of loss wrt input
grad, = tf.gradients(loss, x)
if ord == np.inf:
# Take sign of gradient
normalized_grad = tf.sign(grad)
# The following line should not change the numerical results.
# It applies only because `normalized_grad` is the output of
# a `sign` op, which has zero derivative anyway.
# It should not be applied for the other norms, where the
# perturbation has a non-zero derivative.
normalized_grad = tf.stop_gradient(normalized_grad)
elif ord == 1:
red_ind = list(xrange(1, len(x.get_shape())))
normalized_grad = grad / tf.reduce_sum(tf.abs(grad),
reduction_indices=red_ind,
keep_dims=True)
elif ord == 2:
red_ind = list(xrange(1, len(x.get_shape())))
square = tf.reduce_sum(tf.square(grad),
reduction_indices=red_ind,
keep_dims=True)
normalized_grad = grad / tf.sqrt(square)
else:
raise NotImplementedError("Only L-inf, L1 and L2 norms are "
"currently implemented.")
# Multiply by constant epsilon
scaled_grad = eps * normalized_grad
# Add perturbation to original example to obtain adversarial example
adv_x = x + scaled_grad
# If clipping is needed, reset all values outside of [clip_min, clip_max]
if (clip_min is not None) and (clip_max is not None):
adv_x = tf.clip_by_value(adv_x, clip_min, clip_max)
return adv_x
如您所见,此处的关键行是adv_x = x + scaled_grad
,其中x是输入图像,并且受到scaled_grad
的干扰。
print(scaled_grad)
> Tensor("mul_5:0", shape=(?, 160, 160, 3), dtype=float32)
和
print(x)
> Tensor("input:0", shape=(?, 160, 160, 3), dtype=float32)
所以我想做的是在获取渐变时提取x
的一部分,以便渐变与x
的一部分具有相同的形状,那么我想仅扰动x
的那部分,然后我想将x
的其余部分重新附加到扰动的部分,以便它输出与以前一样的图像,只有定义的部分受到扰动,其余部分不是。
答案 0 :(得分:1)
大多数CleverHans中的攻击都不允许用户指定应修改输入的哪些功能。但是,可以使某些现有的攻击适应这种行为。
例如,在JSMA攻击中,您可以在计算显着性图时掩盖某些功能,以使攻击在构造输入扰动时仅考虑特征的特定子集(在您的情况下为像素的特定子集) )。您必须在jsma_symbolic
中的cleverhans/attacks_tf.py
函数中修改以下代码段:
# Create a mask to only keep features that match conditions
if increase:
scores_mask = ((target_sum > 0) & (other_sum < 0))
else:
scores_mask = ((target_sum < 0) & (other_sum > 0))
您还可以在Brown等人的对抗补丁工作中找到这种行为的示例。他们在其中限制了添加到图像上的扰动,以创建一个对抗示例,使其成为图像中的一个小块:https://github.com/tensorflow/cleverhans/tree/master/examples/adversarial_patch