Question

我想使用SSE将无符号短数组转换为float。我们说

__m128i xVal;     // Has 8 16-bit unsigned integers
__m128 y1, y2;    // 2 xmm registers for 8 float values

我希望y1＆amp;中的前4个uint16在y2中接下来的4 uint16。需要知道哪些内在使用。

Answer 1

首先需要将8 x 16位无符号短路向量解包为两个32位无符号整数向量，然后将这些向量转换为浮点数：

__m128i xlo = _mm_unpacklo_epi16(x, _mm_set1_epi16(0));
__m128i xhi = _mm_unpackhi_epi16(x, _mm_set1_epi16(0));
__m128 ylo = _mm_cvtepi32_ps(xlo);
__m128 yhi = _mm_cvtepi32_ps(xhi);

Answer 2

我建议使用略有不同的版本：

static const __m128i magicInt = _mm_set1_epi16(0x4B00);
static const __m128 magicFloat = _mm_set1_ps(8388608.0f);

__m128i xlo = _mm_unpacklo_epi16(x, magicInt);
__m128i xhi = _mm_unpackhi_epi16(x, magicInt);
__m128 ylo = _mm_sub_ps(_mm_castsi128_ps(xlo), magicFloat);
__m128 yhi = _mm_sub_ps(_mm_castsi128_ps(xhi), magicFloat);

在汇编级别上，与Paul R版本的唯一区别是使用_mm_sub_ps（SUBPS指令）而不是_mm_cvtepi32_ps（CVTDQ2PS指令）。 _mm_sub_ps永远不会比_mm_cvtepi32_ps慢，并且在旧CPU和低功耗CPU上实际上更快（读取：Intel Atom和AMD Bobcat）

SSE：将短整数转换为float

2 个答案: