我是ARM程序集的新手,我想在内联汇编中实现我的一个C函数。我的函数是多精度乘法,它将32位无符号整数与256位无符号整数相乘,并将结果放入288位无符号整数数据类型。我将我的数据类型定义为:
typedef struct UN_256fe{
uint32_t uint32[8];
}UN_256fe;
typedef struct UN_288bite{
uint32_t uint32[9];
}UN_288bite;
这是我的功能:
void multiply32x256(uint32_t A, UN_256fe* B, UN_288bite* res){
uint32_t temp;
asm ( "umull %0, %1, %9, %10;\n\t"
"umull %18, %2, %9, %11;\n\t"
"adds %1, %18, %1; \n\t"
"umull %18, %3, %9, %12;\n\t"
"adcs %2, %18, %2; \n\t"
"umull %18, %4, %9, %13;\n\t"
"adcs %3, %18, %3; \n\t"
"umull %18, %5, %9, %14;\n\t"
"adcs %4, %18, %4; \n\t"
"umull %18, %6, %9, %15;\n\t"
"adcs %5, %18, %5; \n\t"
"umull %18, %7, %9, %16;\n\t"
"adcs %6, %18, %6; \n\t"
"umull %18, %8, %9, %17;\n\t"
"adcs %7, %18, %7; \n\t"
"adc %8, %8, 0 ; \n\t"
: "=r"(res->uint32[8]), "=r"(res->uint32[7]), "=r"(res->uint32[6]), "=r"(res->uint32[5]), "=r"(res->uint32[4]),
"=r"(res->uint32[3]), "=r"(res->uint32[2]), "=r"(res->uint32[1]), "=r"(res->uint32[0])
: "r"(A), "r"(B->uint32[7]), "r"(B->uint32[6]), "r"(B->uint32[5]),
"r"(B->uint32[4]), "r"(B->uint32[3]), "r"(B->uint32[2]), "r"(B->uint32[1]), "r"(B->uint32[0]), "r"(temp));
}
对我来说似乎很好。但是当我调试我的代码时,例如在执行"umull %0, %1, %9, %10;\n\t"
之后的第一行,我有:
(gdb) p/x A //-->%9
$8 = 0x1
(gdb) p/x B->uint32[7] //-->%10
$9 = 0xffffff1
(gdb) p/x res->uint32[8] //-->%0
$10 = 0x1
(gdb) p/x res->uint32[7] //-->%1
$11 = 0x0
我的装配说明中似乎犯了一些错误。任何人都可以向我解释一下吗?