Question

我教计算机架构课程，并为此准备Intel i7浮点处理的示例。示例算法是Madhava-Leibniz系列，这是计算pi除以4的一种特别慢的方法。（我很清楚更好的算法，这种算法足够慢，很容易看到性能差异。）

有3个示例：首先是C ++，接下来是x87浮点处理器，最后是使用AVX512指令集。

对于x87示例，我想了解以下代码是否相当有效。（我知道MMX / SSE / AVX指令会更快，我们也会显示这些指令。）它被编码为Visual Studio 2017中的内联程序集。

C ++变量为：double term1 = 3.0，term2 = 5.0，workSum = 1.0，d4 = 4.0;

我征求您意见的代码是：

    __asm {
        finit;  // Initialize x87
        fld1;   // starting value for result
        fld d4; // term increment amount for each loop iteration
        fld term2;  // starting second term (5)
        fld  term1; // starting first term (3)
        mov ecx,UINT_MAX    // loop limit
    LiebnizLoop: fld1;
        fdiv ST(0), ST(1);  // 1 / term1
        fld  ST(4)  // current result value
        fsub ST(0),ST(1)    // subtract 1 / term1
        fld1;   
        fdiv ST(0), ST(4);  // 1 / term2
        fadd ST(0), ST(1);  // add to current result value
        fstp ST(6); // save back
        ; // Logic to increment terms and continue loop
        fstp st(0); // pop st(0) to itself and delete;
        fstp st(0); 
        fld ST(2);  // get increment (4)
        fadd ST(0), ST(1);  // add to term1
        fst ST(1);  // save updated term 1
        fld ST(2);  // get term 2
        fadd ST(0),ST(4)    // add increment (4)
        fst ST(3);  // save updated term 2
        fstp st(0); // pop st(0) to itself and delete;
        fstp st(0); 
        loop LiebnizLoop    // loop
        fstp term1; // save ending term1, delete term2 and the increment
        fstp st(0); // 
        fstp st(0); // 
        fstp workSum; // save final result
    }

这与-0.000000000029105385的cmath.h M_PI值（除以4）产生了差异，因此，我可以确定结果精确到pi的十进制小数。预先感谢您提供的任何想法！

Intel x87中的Madhava-Leibniz系列

0 个答案: