
时间:2011-10-11 09:39:49

标签: performance


3 个答案:

答案 0 :(得分:22)

来自Agner Fog的指令表:

在Core2 65nm上,FSQRT需要9到69 cc(具有几乎相等的倒数吞吐量),具体取决于值和精度位。为了比较,FDIV需要9到38 cc(几乎相等的倒数吞吐量),FMUL需要5(recipthroughput = 2)而FADD需要3(recipthroughput = 1)。 SSE性能大致相等,但看起来更快,因为它不能做80位数学运算。 SSE具有超快近似倒数和近似倒数sqrt。

在Core2 45nm上,分区和平方根得到了更快; FSQRT需要6到20毫升,FDIV需要6到21毫升,FADD和FMUL没有变化。 SSE的表现再次大致相同。

您可以从his website获取包含此信息的文档。

答案 1 :(得分:11)


以下代码在Windows 7操作系统下的Intel Core i3上运行,并在DevC ++(使用GCC)中编译。您的里程可能会有所不同。

#include <cstdlib>
#include <iostream>
#include <cmath>

Output using -O2:

1 billion square roots running time: 14738ms

1 billion additions running time   : 3719ms

Press any key to continue . . .

Output without -O2:

10 million square roots running time: 870ms

10 million additions running time   : 66ms

Press any key to continue . . .


Square root is about 4 times slower than addition using -O2,
            or about 13 times slower without using -O2

int main(int argc, char *argv[]) {

    const int cycles = 100000;
    const int subcycles = 10000;

    double squares[cycles];

    for ( int i = 0; i < cycles; ++i ) {
        squares[i] = rand();

    std::clock_t start = std::clock();

    for ( int i = 0; i < cycles; ++i ) {
        for ( int j = 0; j < subcycles; ++j ) {
            squares[i] = sqrt(squares[i]);

    double time_ms = ( ( std::clock() - start ) / (double) CLOCKS_PER_SEC ) * 1000;

    std::cout << "1 billion square roots running time: " << time_ms << "ms" << std::endl;

    start = std::clock();

    for ( int i = 0; i < cycles; ++i ) {
        for ( int j = 0; j < subcycles; ++j ) {
            squares[i] = squares[i] + squares[i];

    time_ms = ( ( std::clock() - start ) / (double) CLOCKS_PER_SEC ) * 1000;

    std::cout << "1 billion additions running time   : " << time_ms << "ms" << std::endl;

    return EXIT_SUCCESS;

答案 2 :(得分:6)



以下是MSVC编译开发人员Eric Brummer的精彩演讲:http://channel9.msdn.com/Events/Build/2013/4-329