Tensorflow中具有图像处理的黑色输出(使用jpeg解码器进行神经网络训练)

时间:2018-07-09 11:12:21

标签: python numpy tensorflow

我可以访问大量以TFRecords二进制格式存储的2048x2048x3 jpeg图片。后来,我使用存储的文件来训练深度神经网络。为了存储图片,我目前使用两种不同的方法。

第一个使用张量流。我定义了一个创建Tensorflow图的函数。我一直在为所有图片重复使用相同的图形:

def picture_decoder(height, width):
    g = tf.Graph()
    with g.as_default():
        picture_name_tensor = tf.placeholder(tf.string)
        picture_contents = tf.read_file(picture_name_tensor)
        picture = tf.image.decode_jpeg(picture_contents)
        picture_as_float = tf.image.convert_image_dtype(picture, tf.float32)
        picture_4d = tf.expand_dims(picture_as_float, 0)
        resize_shape = tf.stack([height, width])
        resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
        final_tensor =  tf.image.resize_bilinear(picture_4d, resize_shape_as_int)
    return g, picture_name_tensor, final_tensor

Height, Width = 300, 300
graph, nameholder, image_tensor = picture_decoder(Height, Width)                                        
with tf.Session(graph=graph) as sess:
    init = tf.group( tf.global_variables_initializer(), tf.local_variables_initializer() )
    sess.run(init)

    #Loop through the  pictures
    for(...picture_name...):
        picture = sess.run(image_tensor, feed_dict={nameholder: picture_name} )    

第二种方法使用numpy:

def picture_decoder_numpy(picture_name, height, width):
    image = Image.open(picture_name)
    image = image.resize((height,width), Image.LANCZOS)
    image = np.array(image, dtype=np.int32)                             
    return np.expand_dims(image, axis=0)

Heigth, Width = 300, 300
for(...picture_name...):
    picture = picture_decoder_numpy(pict, Height, Width)

第一种方法似乎比第二种方法快约6倍。

我面临的问题与之后的培训有关。对于第一种情况,我定义的深度神经网络无法学习,即,其损失在许多时期内都没有改善,并且仅比1小一点。使用第二种方法,不更改任何神经网络参数< / em>,则损耗达到E-05值。我是否缺少一些Tensorflow详细信息?

如有必要,我可以发布完整的代码。

更新

使用Tensorflow的方法输出黑色图片,而使用numpy的方法按预期工作。

用于解码图片的MVCE:

from PIL import Image
import numpy as np
import tensorflow as tf

def picture_decoder(height, width):
    g = tf.Graph()
    with g.as_default():
        picture_name_tensor = tf.placeholder(tf.string)
        picture_contents = tf.read_file(picture_name_tensor)
        picture = tf.image.decode_jpeg(picture_contents, dct_method="INTEGER_ACCURATE")
        picture_as_float = tf.image.convert_image_dtype(picture, tf.float32)
        picture_4d = tf.expand_dims(picture_as_float, 0)
        resize_shape = tf.stack([height, width])
        resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
        final_tensor =  tf.squeeze(tf.image.resize_bilinear(picture_4d, resize_shape_as_int))
    return g, picture_name_tensor, final_tensor

def picture_decoder_numpy(picture_name, height, width):
    image = Image.open(picture_name)
    image = image.resize((height,width), Image.LANCZOS)
    return np.array(image, dtype=np.int32)


pic_name = "picture.jpg"
#Numpy method                                                                                            
#picture = picture_decoder_numpy(pic_name, 300, 300)                                                     

#Tensorflow method                                                                                       
graph, nameholder, picture_tensor = picture_decoder(300, 300)
with tf.Session(graph=graph) as sess:
    init = tf.group()
    sess.run(init)
    picture = sess.run(picture_tensor, feed_dict={nameholder: pic_name})

im = Image.fromarray(picture.astype('uint8'))
im.save("save.jpg")

1 个答案:

答案 0 :(得分:0)

TF实现并没有您认为的那样。问题在于图像值被转换为(1, 0)范围,而在numpy方式下,值在(255, 0)范围内。

一种解决方法是将最终结果乘以255

def picture_decoder(height, width):
    g = tf.Graph()
    with g.as_default():
        picture_name_tensor = tf.placeholder(tf.string)
        picture_contents = tf.read_file(picture_name_tensor)
        picture = tf.image.decode_jpeg(picture_contents, dct_method="INTEGER_ACCURATE")
        picture_as_float = tf.image.convert_image_dtype(picture, tf.float32)
        picture_4d = tf.expand_dims(picture_as_float, 0)
        resize_shape = tf.stack([height, width])
        resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
        final_tensor =  tf.squeeze(tf.image.resize_bilinear(picture_4d, resize_shape_as_int)) * 255  # FIX: rescale to typical 8-bit range
    return g, picture_name_tensor, final_tensor

当然,当您同时使用两种不同的插值方法时,两个数组也不应该完全匹配。

norm_dist = np.abs(np.sum(arr - arr2)) / (np.sum(arr) + np.sum(arr2)) / 2
np.isclose(norm_dist, 0, atol=1e-4)
True

(假设arr包含numpy实现,而arr2包含tensorflow)。