Question

鉴于证明三角恒等式：

cos(fi+theta) = cos(theta)cos(fi)-sin(theta).sin(fi)

以下用NASM编写的程序应该在验证身份时打印1，否则为0.我总是得到0作为输出。据我说，我认为我的逻辑是正确的。加载内存或堆栈溢出时存在一些问题。

[bits 32]

extern printf
extern _exit


section .data

hello: db "Value =  %d ", 10, 0
theta: dd 1.0
fi: dd 2.0
threshold: dd 1e-1
SECTION .text                   

global  main                ; "C" main program 

main:
start:


; compute cos(theta)cos(fi)-sin(theta).sin(fi)
fld dword [theta]
fcos
fld dword [fi]
fcos
fmul st0,st1
fstp dword [esp]
fld dword [theta]
fsin
fld dword [fi]
fsin
fmul st0,st1
fld dword [esp]
fmul st0,st1
fstp dword [esp] 

;compute cos(fi+theta)
fld dword [theta]
fld dword [fi]
fadd st0,st1
fcos

; compare
fld dword [esp]  
fsub st0, st1
fld dword [threshold]
fcomi st0, st1
ja .equal
mov eax, 0
jmp .exit

.equal:
    mov eax, 1
.exit:

    mov     dword[ esp ],       hello
    mov     dword [ esp + 4 ],   eax
    call   printf

    ; Call 'exit': exit( 0 );
    mov     dword[ esp ],       0
    call   _exit

Answer 1

测试一个值并不能证明身份！如果我对x*x评估x+x和x=2，这是否意味着我已证明x^2 = x+x？

使用所有可能的float进行测试（其中包含2 ^ 32个包含NaN）是一种很好的方法，可以检查您编写的函数是否适用于所有可能的边角情况，但浮点数仍然是与数学实数不同。 ESP。当结果出现重大错误时（请参阅下文关于fsin）。

尝试这样的事情并发现它适用于所有float s确认它值得寻找数学证明，而不是你已经找到它。

另请注意，在Intel CPU上（显然除了AMD k5之外的所有其他x86 CPU） {Pi} 的fsin insn 不准确 >。根据Bruce Dawson的优秀博客文章，在最坏的情况下，错误是1.37 quintillion units in the last place, leaving fewer than four bits correct。这是由于范围缩小仅有66位Pi常数，并且由于向后兼容性原因而无法修复。（即fsin的确切行为基本上是ISA，疣和所有的一部分。

标准的add / sub / mul / div / sqrt操作都会产生最多0.5 ulp的错误（即使尾数的最后一位正确舍入）。

布鲁斯有一系列的FP文章，索引在this one about FP comparisons。我已将这些文章的链接添加到x86标记维基，因为它们有这么好的信息。他们并不都是关于x87的;他们中的大多数都提到了x87和SSE之间的差异，当它对当前版本的MSVC和/或gcc很重要时。

有趣的事实：浮点触发器标识：

sin(double(pi)) != 0.0。你不应该期望它，因为double不能完全代表无理数pi的值。然而，

 pi ~= double(pi) + sin(double(pi))

（如果sin(double(pi))被准确评估，而不是x87 fsin，并且您的加法精度高于double）

再次引用布鲁斯道森：

这是有效的，因为非常接近pi的数字的正弦几乎等于pi的估计误差。这只是微积分101，是牛顿方法的一种变体，但我仍然觉得它很有魅力。

有关详情see Bruce's FP comparison article，请在其中搜索sin(pi)。他有关于这个浮点三角形身份的整个部分。

Answer 2

编辑：发现！
在回答您的问题时，您的代码有一条指令错误：我已添加评论......

fld dword [theta]  ; Load theta
fcos               ; Get its cos()
fld dword [fi]     ; Load fi
fcos               ; Get its cos()
fmul st0,st1       ; Multiply them
fstp dword [esp]   ; Store away on stack [Why?]

fld dword [theta]  ; Load theta
fsin               ; Get its sin()
fld dword [fi]     ; Load fi
fsin               ; Get its sin()
fmul st0,st1       ; Multiply them

fld dword [esp]    ; Get what was stored away
fmul st0,st1       ; Multiply them??? [ You meant fsub!]
fstp dword [esp]   ; Store it away

倒数第二行应为fsub，而不是fmul

一般评论：
您的代码中有一个符号pi，并且只设置为3.14 - 幸运的是，您不会在代码中使用它。不过，不要这样做：x87知道pi的所有内容：FLDPI

你正确做的另一件事是与门槛进行比较 - 仍然1e-1是一个相当大的三角洲......

你有一个很多更基本的问题，当你谈论＆＃34;堆栈溢出＆＃34;时你提到它。您的代码将值存储在（程序）堆栈中，而不是先为它们腾出空间：

fstp dword [esp]
...
fld dword [esp]
...
mov     dword[ esp ],       hello
mov     dword [ esp + 4 ],   eax
call   printf

最后一个可能会导致程序崩溃：esp+4可能会超过堆栈的顶部！如果您想使用mov而不是push，则应该使用sub esp,8启动您的计划，以便首先为两个dword腾出空间。

你知道x87已经有一个8个浮点寄存器的内部堆栈吗？你可以“远离”＃39;在计算其他中间结果时，该堆栈内的值，然后再次返回这些值。将它们存储到内存中（例如上面的第一行代码）实际上会降低中间结果的精确度。

在nasm

2 个答案:

有趣的事实：浮点触发器标识：