Question

如何将已将argb像素通道中的float32值编码的png图像解码为float32张量？例如，KITTI数据集提供的深度图像。

函数

tf.image.decode_png()

只能给我uint8或uint16值，而不能给我正确的float32值。

是否有任何变通方法或解决方案来获得这样的float32张量流张量？

编辑： 因此，在png中，每个通道都存储一个uint8值。并且所有4个通道（argb）共同组成一个float32值。实际上，可以使用PIL和numpy轻松读取（此代码由KITTI数据集提供）：

from PIL import Image
import numpy as np

depth_png = np.array(Image.open(filename), dtype=int)
depth = depth_png.astype(np.float) / 256.

在这里，我猜int至少是int32，以便保留信息。

但是，我正在寻找一种方法以某种方式将其传递到可在数据集中即时加载的tensorflow张量。

Answer 1

.png格式将通道值存储为uint8。

要转换为介于0和1之间的float32，我们可以进行强制转换然后除以255（uint8的最大值）。

类似这样的东西：

img_bytes = tf.io.read_file('path/img.png')
img_tensor_uint8 = tf.image.decode_png(img_bytes)
img_tensor_float32 = tf.cast(img_tensor_uint8, tf.float32) / 255

Answer 2

所以这里的问题是KITTI png图像是灰度uint16值。这样正确的解码是：

image = tf.read_file(path)
image = tf.image.decode_png(image, channels=0, dtype=tf.uint16)
image = tf.cast(image, tf.float32)
image = image / 256.0

对于一般用例，Stewart_R答案当然也是正确的。

如何将float32编码的png解码为张量？

2 个答案: