我正在尝试为下面的标量代码编写氖级SIMD:
标量代码:
int *xt = new int[50];
float32_t input1[16] = {12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,};
float32_t input2[16] = {13.0f,12.0f,9.0f,12.0f,12.0f,12.0f,12.0f,12.0f,13.0f,12.0f,9.0f,12.0f,12.0f,12.0f,12.0f,12.0f};
float32_t threshq = 13.0f;
uint32_t corners_count = 0;
float32_t threshq =13.0f;
for (uint32_t x = 0; x < 16; x++)
{
if ( (input1[x] == input2[x]) && (input2[x] > threshq) )
{
xt[corners_count] = x ;
}
}
氖:
float32x4_t t1,t2,t3;
uint32x4_t rq1,rq2,rq3;
t1 = vld1q_f32(input1); // 12 12 12 12
t2 = vld1q_f32(input2); // 13 12 09 12
t3 = vdupq_n_f32(threshq); // 13 13 13 13
rq1 = vceqq_f32(t1,t2); // condition to check for input1 equal to input2
rq2 = vcgtq_f32(t1,t3); // condition to check for input1 greater than to threshold
rq3 = vandq_u32(rq1,rq2); // anding the result of two conditions
for( int i = 0;i < 4; i++){
corners_count = corners_count + rq3[i];
//...Not able to write a logic in neon for the same
}
我无法在Neon中编写逻辑。 任何人都可以真正指导我。我完全厌倦了思考这个逻辑
答案 0 :(得分:1)
由于循环中存在依赖关系,我认为您需要将代码重新分解为SIMD循环,然后是标量循环。伪代码:
// SIMD loop
for each set of 4 float elements
apply SIMD threshold test
store 4 x bool results in temp[]
// scalar loop
for each bool element in temp[]
if temp[x]
xt[corners_count] = x
corner_count++
通过这种方式,您可以在大多数操作中获得SIMD的好处,并且您只需要在最后一部分使用标量代码。