我在这个ARM内联汇编程序上有一些奇怪的行为。 我正在研究他的硅3531板。我正在使用以下选项编译我的代码:
-march = armv7-a -mcpu = cortex-a9 -mfloat-abi = softfp -mtune = cortex-a9 -mfpu = vfpv3-d16 -marm
这是代码:
#define N 134217728
d = (float*)memalign(16, N);
x = (float*)memalign(16, N);
for(i = 0; i < N/4; i++) x[i] = (float)rand()/(float)RAND_MAX;
float coeff[8] = {1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6};
for(i=0; i<N/4; i++)
{
asm volatile ("flds s0, %[mem]\n\t" : : [mem]"m" (coeff[0]) : "s0");
asm volatile("flds s1, %[mem]\n\t" : : [mem]"m" (x[i]) : "s1", "memory");
asm volatile("flds s2, %[mem]\n\t" : : [mem]"m" (coeff[1]) : "s2");
asm volatile("fmacs s2, s0, s1\n\t" : : : "s2");
asm volatile("flds s0, %[mem]\n\t" : : [mem]"m" (coeff[2]) : "s0");
asm volatile("fmacs s0, s2, s1\n\t" : : : "s0");
asm volatile("flds s2, %[mem]\n\t" : : [mem]"m" (coeff[3]) : "s2");
asm volatile("fmacs s2, s0, s1\n\t" : : : "s2");
asm volatile("flds s0, %[mem]\n\t" : : [mem]"m" (coeff[4]) : "s0");
asm volatile ("fmacs s0, s2, s1\n\t" : : : "s0");
asm volatile("flds s2, %[mem]\n\t" : : [mem]"m" (coeff[5]) : "s2");
asm volatile("fmacs s2, s0, s1\n\t" : : : "s2");
asm volatile("flds s0, %[mem]\n\t" : : [mem]"m" (coeff[6]) : "s0");
asm volatile("fmacs s0, s2, s1\n\t" : : : "s0");
asm volatile ("flds s2, %[mem]\n\t" : : [mem]"m" (coeff[7]) : "s2");
asm volatile("fmacs s2, s0, s1\n\t" : : : "s2");
asm volatile ("fsts s2, %[mem]\n\t" : [mem]"=m" (d[i]) : : "s2");
}
问题是我在两行上接收到分段错误,其中我传递x [i]和d [i]引用。 相反,如果我没有传递到内联汇编程序数组引用,但我将值复制到临时变量(b和c),代码工作正常,结果是正确的:
for(i=0; i<N/4; i++)
{
asm volatile ("flds s0, %[mem]\n\t" : : [mem]"m" (coeff[0]) : "s0");
float b = x[i];
asm volatile("flds s1, %[mem]\n\t" : : [mem]"m" (b) : "s1", "memory");
asm volatile("flds s2, %[mem]\n\t" : : [mem]"m" (coeff[1]) : "s2");
asm volatile("fmacs s2, s0, s1\n\t" : : : "s2");
asm volatile("flds s0, %[mem]\n\t" : : [mem]"m" (coeff[2]) : "s0");
asm volatile("fmacs s0, s2, s1\n\t" : : : "s0");
asm volatile("flds s2, %[mem]\n\t" : : [mem]"m" (coeff[3]) : "s2");
asm volatile("fmacs s2, s0, s1\n\t" : : : "s2");
asm volatile("flds s0, %[mem]\n\t" : : [mem]"m" (coeff[4]) : "s0");
asm volatile ("fmacs s0, s2, s1\n\t" : : : "s0");
asm volatile("flds s2, %[mem]\n\t" : : [mem]"m" (coeff[5]) : "s2");
asm volatile("fmacs s2, s0, s1\n\t" : : : "s2");
asm volatile("flds s0, %[mem]\n\t" : : [mem]"m" (coeff[6]) : "s0");
asm volatile("fmacs s0, s2, s1\n\t" : : : "s0");
asm volatile ("flds s2, %[mem]\n\t" : : [mem]"m" (coeff[7]) : "s2");
asm volatile("fmacs s2, s0, s1\n\t" : : : "s2");
float c;
asm volatile ("fsts s2, %[mem]\n\t" : [mem]"=m" (c) : : "s2");
d[i] = c;
}
这里发生了什么?语法还可以,我在书籍和各种网站上仔细检查了它。 你能帮助我吗? 谢谢!
===编辑1 === 我按照建议尝试将所有汇编代码放在一个语句中,但行为完全相同。我仍然需要将数组值复制到临时值以获取正确的值。 这是获取代码的godbolt链接:https://goo.gl/51a3QO
的Riccardo