Question

将无符号短数组（每个值16位）转换为无符号整数数组（每个值32位）的最有效方法是什么？

Answer 1

复制它。

unsigned short source[]; // …
unsigned int target[]; // …
unsigned short* const end = source + sizeof source / sizeof source[0];
std::copy(source, end, target);

std::copy在内部为给定的输入类型选择了最佳的复制机制。但是，在这种情况下，可能没有比在循环中单独复制元素更好的方法了。

Answer 2

在C ++中使用std::copy：

#include<algorithm> //must include

unsigned short ushorts[M]; //where M is some const +ve integer
unsigned int   uints[N]; //where N >= M
//...fill ushorts
std::copy(ushorts, ushorts+M, uints);

在C语言中，使用手动循环（事实上，您可以在C和C ++中使用手动循环）：

int i = 0;
while( i < M ) { uints[i] = ushorts[i]; ++i; }

Answer 3

这里是一个以64位块访问的展开循环。它可能比简单循环快一点，但测试是了解的唯一方法。

假设N是4的倍数，那个sizeof（short）是16位，并且可以使用64位寄存器。

 typedef union u {
     uint16_t    us[4];
     uint32_t    ui[2];
     uint64_t    ull;
 } u_t;
 ushort_t src[N] = ...;
 uint_t dst[N];

 u_t *p_src = (u_t *) src;
 u_t *p_dst = (u_t *) dst;
 uint_t i;
 u_t tmp, tmp2;
 for(i=0; i<N/4; i++) {
     tmp = p_src[i];    /* Read four shorts in one read access */
     tmp2.ui[0] = tmp.us[0];   /* The union trick avoids complicated shifts that are furthermore dependent on endianness. */
     tmp2.ui[1] = tmp.us[1];   /* The compiler should take care of optimal assembly decomposition. */ 
     p_dst[2*i] = tmp2;  /* Write the two first ints in one write access. */
     tmp2.ui[0] = tmp.us[2];
     tmp2.ui[1] = tmp.us[3];
     p_dst[2*i+1] = tmp2; /* Write the 2 next ints in 1 write access. */
 }

修改

所以我只是在具有GCC 3.4.1的SUN M5000（SPARC64 VII 2.5 GHz）上以64位模式在4,000,000个元件阵列上进行了测试。天真的实现速度要快一些。我尝试使用SUNStudio 12和GCC 4.3，但由于数组大小，我甚至无法编译程序。

<强> EDIT2

我设法在GCC 4.3上编译它。优化版本比天真版本快一点。

GCC 3.4 GCC 4.3 naive 11.1 ms 11.8 ms optimized 12.4 ms 10.0 ms

<强> EDIT3

我们可以从中得出结论，就C而言，不要为复制循环的优化版本而烦恼，增益太低，以至于错误的风险超过了收益。

Answer 4

怎么样？

unsigned short src[N] = ...;
unsigned int dst[N];

for(i=0; i<N; ++i)
    dst[i] = src[i];

对于C ++版本，Konrad或Nawaz的答案肯定更适合。

Answer 5

使用与int[]相同的长度初始化short[]。
对short[]进行迭代，将i的{{1}} ^th元素分配给short[] ^th i。

Answer 6

在许多体系结构中，递减do-while可能比此处提出的for和while循环更快。类似的东西：

unsigned short ushorts[M];
unsigned int uints[N];

int i = M-1;
do{
    uints[i] = ushorts[i];
    i--;
} while(i >= 0);

编译器可以处理大多数优化，例如循环展开，但通常上述速度更快（在许多架构上），因为：

您可以do-while与while或for
当i = 0时循环结束。检查0可以保存指令，因为零标志是自动设置的。如果循环递增并且当i = M时结束，则可能需要额外的比较指令来测试i＆lt;微米。

也可能有更快的方法，例如完全使用指针算法。这可能会变成一种有趣的练习，即拆解代码并进行分析以查看哪些代码更快。它取决于所有架构。幸运的是，其他人已经使用std :: copy完成了这项工作。

Answer 7

只需复制短数组的地址即可访问短数组的每个元素，例如pTp32[0...LEN-1].arr[0..1]：

unsigned short shrtArray[LEN]; //..
union type32
{
    short arr[2];
    int value;
};
type32 * pTp32 = (type32*)shrtArray;

将16位短数组转换为32位int数组的有效方法？

7 个答案: