我问了一个关于vclt_s8比较的问题。 Does anybody know how to use Neon intrinsics uint8x8_t vclt_s8 (int8x8_t, int8x8_t)
但是,如果我们有这样的代码:
if(a > b + c) {
a = b + c;
} else if(a < b - c) {
a = b - c;
}
如何将其转换为Neon内在函数?在这种情况下,似乎我们不能做8个操作员并行操作。不是吗?
答案 0 :(得分:5)
显然你不能用SIMD进行分支,所以你必须看看如何使用掩码以无分支的方式实现这种逻辑。我只是给出伪代码,所以你得到了一般的想法 - 编码这应该是相当简单的:
bc = b + c ; get `(b + c)` in a vector register
mask = a > bc ; use compare instruction to generate mask (-1 = true, 0 = false)
bc = bc & mask ; use bitwise AND to zero out elements of `(b + c)` which we do not want
a = a & ~mask ; use bitwise ANDC to zero out elements of `a` which we do not want
a = a | bc ; combine required elements into `a` using bitwise OR
bc = b - c ; get `(b - c)` in a vector register
mask = a < bc ; use compare instruction to generate mask (-1 = true, 0 = false)
bc = bc & mask ; use bitwise AND to zero out elements of `(b - c)` which we do not want
a = a & ~mask ; use bitwise ANDC to zero out elements of `a` which we do not want
a = a | bc ; combine required elements into `a` using bitwise OR
请注意,我在这里作了一点欺骗,并从标量代码中省略了else
(假设两个分支是互斥的),所以我实现的实际上相当于:
if (a > b + c) {
a = b + c;
}
if (a < b - c) {
a = b - c;
}
如果这是一个不好的假设,那么你需要做一些额外的按位操作来实现逻辑else
。