Question

https://www.tensorflow.org/versions/r0.12/api_docs/python/framework/tensor_types中列出的tf.float16和tf.bfloat16有什么区别？

另外，“量化整数”是什么意思？

Answer 1

bfloat16是一种特定于张量流的格式，与IEEE自己的float16不同，因此是新名称。

基本上，blfoat16是截断到前16位的float32。所以它对于指数具有相同的8位，对于尾数只有7位。因此，很容易从float32转换为float32，因为它与NaN的范围基本相同，因此可以最大限度地降低转换时float32或爆炸/消失渐变的风险// Compact 16-bit encoding of floating point numbers. This representation uses // 1 bit for the sign, 8 bits for the exponent and 7 bits for the mantissa. It // is assumed that floats are in IEEE 754 format so the representation is just // bits 16-31 of a single precision float. // // NOTE: The IEEE floating point standard defines a float16 format that // is different than this format (it has fewer bits of exponent and more // bits of mantissa). We don't use that format here because conversion // to/from 32-bit floats is more complex for that format, and the // conversion for this format is very simple.。

来自sources：

对于量化整数，它们旨在替换训练网络中的浮点以加速处理。基本上，它们是实数的一种定点编码，尽管选择的操作范围代表在网络的任何给定点观察到的分布。

有关量化的更多信息here。

什么是tf.bfloat16“截断的16位浮点”？

1 个答案: