我正在使用https://github.com/torvalds/linux/blob/master/lib/math/div64.c处理https://godbolt.org/中的代码,这些代码针对ARM Cortex-M7微控制器。
与gcc(69行)相比, clang生成了更多的代码(208行汇编)。
我正在使用-O3
,它应该展开/展开/等,但是...在这种情况下,我不确定它是否很好。
不知道这是否有帮助,但是:
1)删除第一个或第二个循环使结果类似于gcc产生的结果;
2)clang 5.0.0创建的代码要小得多。
这是代码(下面还有配置示例的链接):
unsigned int div64_32(unsigned long long * n, unsigned int base)
{
unsigned long long rem = *n;
unsigned long long b = base;
unsigned long long res, d = 1;
unsigned int high = rem >> 32;
/* Reduce the thing a bit first */
res = 0;
if (high >= base) {
high /= base;
res = (unsigned long long) high << 32;
rem -= (unsigned long long) (high*base) << 32;
}
while ((long long)b > 0 && b < rem) {
b = b+b;
d = d+d;
}
do {
if (rem >= b) {
rem -= b;
res += d;
}
b >>= 1;
d >>= 1;
} while (d);
*n = res;
return rem;
}
两个编译器的选项:
ARM GCC 7.2.1(无):-O3 -mcpu=cortex-m7 -mthumb -mfloat-abi=hard -mfpu=vfpv4
x86-x64铛声(trunk):-O3 -mcpu=cortex-m7 -mthumb -mfloat-abi=hard -mfpu=vfpv4 -target armv7m-none-eabi
最后是示例的链接:link
更新
我在NXP Kinetis KV58上进行了测试。频率240 MHz,所有代码都移至I-TCM。
用示波器测量的结果:
run 1 run 2
iar: 27.2 ms 17.6 ms
clang: 29.6 ms 19.2 ms
gcc: 22.1 ms 14.0 ms
使用n *= 123
运行1(请参见下面的代码)
使用n *= 127
我自己没有为clang和gcc编译代码,只需从网站复制并粘贴即可。我还对GCC代码进行了更改,因为编译器无法识别指令(另请参见问题注释):
ldrd r0, [r0]
(第9行)=> ldrd r0, r1, [r0]
)
strd r6, [ip]
(第55行)=> strd r6, r7, [r12]
)
void test()
{
debug_pin(0, 0); // test pin 0: set LOW
for (int i = 0; i < 1000; i++);
debug_pin(0, 1); // test pin 0: set HIGH
unsigned long long n = 54321, base = 123;
for (int i = 0; i < 10000; i++)
{
base = div64_32_iar(&n, base);
n += 65432;
n *= 127; // 1st test was with 123
base += 123;
}
debug_pin(0, 0);
for (int i = 0; i < 1000; i++);
debug_pin(0, 1);
n = 54321, base = 123;
for (int i = 0; i < 10000; i++)
{
base = div64_32_clang(&n, base);
n += 65432;
n *= 127; // 1st test was with 123
base += 123;
}
debug_pin(0, 0);
for (int i = 0; i < 1000; i++);
debug_pin(0, 1);
n = 54321, base = 123;
for (int i = 0; i < 10000; i++)
{
base = div64_32_gcc(&n, base);
n += 65432;
n *= 127; // 1st test was with 123
base += 123;
}
debug_pin(0, 0);
}