Question

我有以下代码，通过使用张量流中的VGG16（未完全连接包含）的预训练模型来估计两个图像的失真。这只是我对CNN的第一次学习的小型研究工作。因此，我只使用预训练的模型，计划将来针对我的情况进行微调。

所以，可以说我有原始和噪点彩色图像作为输入，尺寸为64x64馈入了模型。我为两个输入图像都提取了VGG16预训练模型的特征图，然后使用SSIM作为损失函数来计算它们的差异。简而言之，我的目标是基于CNN模型获得感知失真。我有几个问题：

我了解到VGG16的标准输入尺寸为224x224x3（高度x宽度x颜色）。我使用64x64x3，因为这是我的情况，对于我的计算机来说太重了，无法处理224x224x3。我可以使用SSIM参数multichannel = False或multichannel = False获得除最后一个卷积层以外的所有层的SSIM结果。对于最后一个卷积层，它给我错误消息：win_size exceeds image extent.而且我不知道如何管理它。我的第一个猜测，由于其尺寸对于SSIM而言太小了
我确实读过一个讨论，说当VGG16的输入不处于其标准尺寸时，最好使用multichannel = False计算差异。的确如此，但是让我想到这意味着如果不在彩色图像中进行计算，我将丢失很多特征信息。请纠正我。
如果直接计算失真而不进行展平和标准化会更好吗？如果是，我该怎么办？如果不进行展平和规范化处理，它会给我与点1相同的错误，因为卷积结果将太大而无法馈送到SSIM函数。
我确实尝试过使用tf.image.ssim来估算ssim，但是在shape1 = img1.get_shape().with_rank_at_least(3)行中失败了。我认为这是由于该函数是基于numpy的，并且它的操作与我在下面的代码中完成的sim函数相同。
如果除SSIM之外还有其他更好的损失函数可以满足我的目标，我真的愿意使用任何更好的建议。

很抱歉，只有这种情况有很多问题。我真的希望有人能帮助我，同时给我更好的见解。干杯...

from __future__ import print_function
from skimage import measure as metric
import numpy as np
import tensorflow as tf
from numpy import array
from vgg16_new import Vgg16
from utils.utils import *
import cv2
import matplotlib.pyplot as plt
from lap import lapjv

class CNN(object):
      def __init__(self):
          self.height = 64
          self.width = 64
          self.shape = np.array([64.0, 64.0])
          self.sift_weight = 2.0
          self.cnn_weight = 1.0
          self.cnnph = tf.placeholder("float", [1, 64, 64, 3])
          self.vgg = Vgg16()
          self.vgg.build(self.cnnph)

      def feature_maps_distortion(self, I_org, I_noized):
          I_org = array(I_org).reshape(1, self.height, self.width, 3)
          I_noized = array(I_noized).reshape(1, self.height, self.width, 3)

          #CNN feature: propagate the images through VGG16
          with tf.Session() as sess:
               feed_dict = {self.cnnph: I_org}
               D1_org, D2_org, D3_org = sess.run([
                       self.vgg.pool3, self.vgg.pool4, self.vgg.pool5
               ], feed_dict=feed_dict)

               feed_dict = {self.cnnph: I_noized}
               D1_noised, D2_noised, D3_noised = sess.run([
                          self.vgg.pool3, self.vgg.pool4, self.vgg.pool5
               ], feed_dict=feed_dict)

         #flatten original input
         DX1_org = np.reshape(D1_org[0], [-1, 256])
         DX2_org = np.reshape(D2_org[0], [-1, 512])
         DX3_org = np.reshape(D3_org[0], [-1, 512])

         #flatten noised input
         DX1_noised = np.reshape(D1_noised[0], [-1, 256])
         DX2_noised = np.reshape(D2_noised[0], [-1, 512])
         DX3_noised = np.reshape(D3_noised[0], [-1, 512])

         #normalize original input
         DX1_org = DX1_org / np.std(DX1_org)
         DX2_org = DX2_org / np.std(DX2_org)
         DX3_org = DX3_org / np.std(DX3_org)

         #normalize noised input
         DX1_noised = DX1_noised / np.std(DX1_noised)
         DX2_noised = DX2_noised / np.std(DX2_noised)
         DX3_noised = DX3_noised / np.std(DX3_noised)

         #compute distortion with SSIM
         s1 = metric.compare_ssim(DX1_org, DX1_noised, multichannel=False)
         s2 = metric.compare_ssim(DX2_org, DX2_noised, multichannel=False)
         #s3 = metric.compare_ssim(DX3_org, DX3_noised, multichannel=False)
              #Calculate SSIM for s3 gives me error:
              #win_size exceeds image extent.  If the input is a 
              #multichannel (color) image, set multichannel=True.

         print("SSIM pool_3: %.2f" % s1)
         print("SSIM pool_4: %.2f" % s2)
         #print("SSIM pool_5: %.2f" % s3)

         del D1_org, D1_noised

使用Tensorflow估计VGG16特征图的失真

0 个答案: