如何在ARM NEON SIMD内部函数上编码“a [i] = b [c [i]]”

时间:2017-06-04 10:55:40

标签: arm simd intrinsics neon

我正在尝试将此C / C ++代码翻译成SIMD内在函数。

for(int i=0 ; i < length ; i++)
    A[i] = B[C[i]];

我可以在代码下面翻译(C / C ++)

for(int i=0 ; i < length ; i++)
    A[i] = B[i];

到SIMD代码(使用内在函数)

for(int i=0 ; i < length-16 ; i+=16) {
    uint8x16_t v0 = vld1q_u8(A+i);
    vst1q_u8(A+i, v0);
}

我知道关键字是交错来解决这个问题。但我找不到解决方案。

感谢。

修改
For more information

unsigned char A [32] = {0,}; // Output Array
unsigned char B [20] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}; // An array with values to pass to A Array
unsigned int C [32] = {19,15,11,10,5,3,6,4,5,19,10,14,16,14,8,9,10,20,11,1, 0, 3, 5, 19, 20, 11, 13, 9, 30, 31, 7}; // An array with the index information of the B array.

是否有任何内在函数可以生成以下代码形式?

int length = 32;
For (int i = 0; i < length-8; i+=8)
{
    Uint8x8_t v_idx = vld1_u8 (C + i);
    Uint8x8_t v = func (A, v_idx); // func (uint8_t, uint32x4_t)
    vst1_u8(C+i, v);
}

将输出20, 16, 12, 11, 6, 4, 7, 5, 6, 6, 20, 11, 15, 17, 15, 9, 10, 11, 21, 12, 2, 1, 4, 6, 20, 21, 12, 14, 10, 31, 32, 8

[注]
A和B是uint8_t *类型,因为它们是值为0到255的图像,而C是uint32_t *类型,因为它们是由B索引索引的。

1 个答案:

答案 0 :(得分:0)

由于您没有提供大量信息,因此有点难以确定,但vqtbl1_u8可能正是您所寻找的。这只是AArch64,但armv7上有vtbl1_u8

一个简单的例子:

int main (void) {
  uint8_t bp[] = { 1,  1,  2,  3,  5,  8, 13, 21 };
  uint8_t cp[] = { 0,  2,  4,  6,  1,  3,  5,  7 };

  uint8x8_t b = vld1_u8(bp);
  uint8x8_t c = vld1_u8(cp);

  uint8x8_t a = vtbl1_u8(b, c);
  uint8_t ap[8];
  vst1_u8(ap, a);

  for (int x = 0 ; x < 8 ; x++)
    printf("%3u ", ap[x]);
  printf("\n");

  return 0;
}

将输出1 2 5 13 1 3 8 21