即使使用-O3,clang为div64生成如此多的代码是否很好?

时间:2019-07-12 13:22:57

标签: c arm clang cortex-m

我正在使用https://github.com/torvalds/linux/blob/master/lib/math/div64.c处理https://godbolt.org/中的代码,这些代码针对ARM Cortex-M7微控制器。

与gcc(69行)相比,

clang生成了更多的代码(208行汇编)。 我正在使用-O3,它应该展开/展开/等,但是...在这种情况下,我不确定它是否很好。

不知道这是否有帮助,但是:
1)删除第一个或第二个循环使结果类似于gcc产生的结果;
2)clang 5.0.0创建的代码要小得多。

这是代码(下面还有配置示例的链接):

unsigned int div64_32(unsigned long long * n, unsigned int base)
{
    unsigned long long rem = *n;
    unsigned long long b = base;
    unsigned long long res, d = 1;
    unsigned int high = rem >> 32;

    /* Reduce the thing a bit first */
    res = 0;
    if (high >= base) {
        high /= base;
        res = (unsigned long long) high << 32;
        rem -= (unsigned long long) (high*base) << 32;
    }

    while ((long long)b > 0 && b < rem) {
        b = b+b;
        d = d+d;
    }

    do {
        if (rem >= b) {
            rem -= b;
            res += d;
        }
        b >>= 1;
        d >>= 1;
    } while (d);

    *n = res;
    return rem;
}

两个编译器的选项:
ARM GCC 7.2.1(无):-O3 -mcpu=cortex-m7 -mthumb -mfloat-abi=hard -mfpu=vfpv4
x86-x64铛声(trunk):-O3 -mcpu=cortex-m7 -mthumb -mfloat-abi=hard -mfpu=vfpv4 -target armv7m-none-eabi

最后是示例的链接:link

更新
我在NXP Kinetis KV58上进行了测试。频率240 MHz,所有代码都移至I-TCM。
用示波器测量的结果:

        run 1     run 2
iar:   27.2 ms   17.6 ms
clang: 29.6 ms   19.2 ms
gcc:   22.1 ms   14.0 ms

使用n *= 123运行1(请参见下面的代码)
使用n *= 127

运行2

我自己没有为clang和gcc编译代码,只需从网站复制并粘贴即可。我还对GCC代码进行了更改,因为编译器无法识别指令(另请参见问题注释):
ldrd r0, [r0](第9行)=> ldrd r0, r1, [r0]
strd r6, [ip](第55行)=> strd r6, r7, [r12]

void test()
{
    debug_pin(0, 0); // test pin 0: set LOW
    for (int i = 0; i < 1000; i++);

    debug_pin(0, 1); // test pin 0: set HIGH

    unsigned long long n = 54321, base = 123;
    for (int i = 0; i < 10000; i++)
    {
        base = div64_32_iar(&n, base);

        n += 65432;
        n *= 127; // 1st test was with 123
        base += 123;
    }
    debug_pin(0, 0);

    for (int i = 0; i < 1000; i++);

    debug_pin(0, 1);
    n = 54321, base = 123;
    for (int i = 0; i < 10000; i++)
    {
        base = div64_32_clang(&n, base);

        n += 65432;
        n *= 127; // 1st test was with 123
        base += 123;
    }
    debug_pin(0, 0);

    for (int i = 0; i < 1000; i++);

    debug_pin(0, 1);
    n = 54321, base = 123;
    for (int i = 0; i < 10000; i++)
    {
        base = div64_32_gcc(&n, base);

        n += 65432;
        n *= 127; // 1st test was with 123
        base += 123;
    }
    debug_pin(0, 0);
}

0 个答案:

没有答案