Question

似乎没有关于这两个功能的文档。

__float2half和__float2half_rn之间的区别是什么？

Answer 1

看来CUDA文档在这里确实有点不足。

在CUDA 7.5中引入新的unsigned short __float2half_rn(float)数据类型之前，函数float __half2float(unsigned short x)与half结合使用已存在于CUDA中。它在device_functions.h中定义。那里的评论写道：

将单精度浮点值 x 转换为以 unsigned short 格式表示的半精度浮点值，采用舍入到最接近均匀模式。 / p>

函数half __float2half(float)在cuda_fp16.h中定义并且显然相同，但返回half：

将浮点数 a 转换为舍入到最接近模式的半精度。

但是，由于half是unsigned short的typedef，我使用以下代码检查它们是否相同：

#include <stdio.h>
#include "cuda_fp16.h"
#include "device_functions.h"
__global__ void test()
{
//  auto test = __float2half( 1.4232 );
    auto test = __float2half_rn( 1.4232 );
    printf( "%hu\n", test );
}

int main()
{
    test<<<1,1>>>();
    cudaDeviceSynchronize();
}

我发现（对于sm_20）旧的__float2half_rn()有一个额外的int16到int32操作并且执行32位存储。另一方面，__float2half_()没有这种转换并且有16位存储。

__float2half_rn()的相关SASS代码：

/*0040*/         I2I.U32.U16 R0, R0;
/*0050*/         STL [R2], R0;

__float2half()：

/*0048*/         STL.U16 [R2], R0;

半精度：float2half与float2half_rn之间的差异

1 个答案:

半精度：__float2half与__float2half_rn之间的差异

1 个答案:

半精度：float2half与float2half_rn之间的差异