ARM组件上的浮点指令

时间:2016-07-14 15:52:17

标签: c performance assembly arm fpu

我正在尝试创建一个ARM基准测试,循环遍历以下指令(在汇编中),单独和组合:

  • 整数添加
  • 整数乘法
  • 浮点数添加
  • 浮点乘法

这是我的整数运算代码:

int additions_int(int n) {

    int i, dummyValue = n;

    __asm (
        "MOV R0, #2\n"
        "MOV R1, #6\n"
    );

    for (i = 0; i < n/LOOP_STEP; i++) {

        __asm (
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
        );
    }

    return dummyValue;
}


int multiplications_int(int n) {

    int i, dummyValue=n;

    __asm (
        "MOV R0, #2\n"
        "MOV R1, #6\n"
    );

    for (i = 0; i < n/LOOP_STEP; i++) {

        __asm (

            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"

        );

    }

    return dummyValue;
}

问题在于浮点运算。我检查了this documentation,我试着做这样的事情:

float multiplications_fp(int n) {
    int i;
    float fn=n, dummyValue = fn;

    for (i = 0; i < n/LOOP_STEP; i++) {
        __asm (
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
        );
    }

    return dummyValue;
}


float additions_fp(int n) {
    int i;
    float fn=n, dummyValue = fn;

    for (i = 0; i < n/LOOP_STEP; i++) {
        __asm (
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n"
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n"  
        );
    }

    return dummyValue;
}

编译:

arm-linux-gnueabi-gcc -static -march=armv7-a microbenchmark_arm.c -o microbenchmark_arm

我收到了这个错误:

Error: selected processor does not support ARM mode `vmul.f32 R0,R0,R1'
Error: selected processor does not support ARM mode `vadd.f32 R0,R0,R1'

有谁能说我做错了什么?

有人能给我看一个ARM Cortex-A架构的浮点加法或乘法的例子吗?

1 个答案:

答案 0 :(得分:4)

浮点指令具有不同的寄存器库。对于大多数指令,您不能共享这些寄存器。但这与Neon SIMD指令的寄存器相同。

如果您想要单精度,可以使用:

VMUL.F32 s0, s0, s1

如果您想要双精度,可以使用:

VMUL.F64 d0, d0, d1

请注意,如果操作系统没有这样做,可能需要先启用浮点引擎。