Question

我读到使用一个将在紧密循环中调用的函数的策略类比使用多态函数快得多。但是，我设置了这个演示，时间表明它正好相反！？策略版本比多态版本长2-3倍。

#include <iostream>

#include <boost/timer.hpp>

// Policy version
template < typename operation_policy>
class DoOperationPolicy : public operation_policy
{
  using operation_policy::Operation;

public:
  void Run(const float a, const float b)
  {
    Operation(a,b);
  }
};

class OperationPolicy_Add
{
protected:
  float Operation(const float a, const float b)
  {
    return a + b;
  }
};

// Polymorphic version
class DoOperation
{
public:
  virtual float Run(const float a, const float b)= 0;
};

class OperationAdd : public DoOperation
{
public:
  float Run(const float a, const float b)
  {
    return a + b;
  }
};

int main()
{
  boost::timer timer;

  unsigned int numberOfIterations = 1e7;

  DoOperationPolicy<OperationPolicy_Add> policy_operation;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
    {
    policy_operation.Run(1,2);
    }
  std::cout << timer.elapsed() << " seconds." << std::endl;
  timer.restart();

  DoOperation* polymorphic_operation = new OperationAdd;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
    {
    polymorphic_operation->Run(1,2);
    }
  std::cout << timer.elapsed() << " seconds." << std::endl;

}

演示有问题吗？或者说政策应该更快是不正确的？

Answer 1

你的基准没有意义（抱歉）。

不幸的是，制作真正的基准测试很困难，因为编译器非常聪明。

在这里寻找的东西：

虚拟化：多态调用预计会变慢，因为它应该是虚拟的，但编译器可以实现，而polymorphic_operation必须是OperationAdd，因此直接调用OperationAdd::Run不调用运行时调度
内联：由于编译器可以访问方法体，因此它可以内联它们，并完全避免函数调用。
“死存储删除”：不需要存储未使用的值，并且可以完全避免导致它们并且不会引起副作用的计算。

确实，您的整个基准代码可以优化为：

int main()
{
  boost::timer timer;

  std::cout << timer.elapsed() << " seconds." << std::endl;

  timer.restart();

  DoOperation* polymorphic_operation = new OperationAdd;

  std::cout << timer.elapsed() << " seconds." << std::endl;
}

当你意识到自己没有计划自己想要的时间时......

为了使您的基准有意义，您需要：

防止虚拟化
强制副作用

要防止虚拟化，只需声明DoOperation& Get()函数，然后在另一个cpp文件中声明：DoOperation& Get() { static OperationAdd O; return O; }。

强制副作用（仅在方法内联时才需要）：返回值并累积它，然后显示它。

使用此程序的行动：

// test2.cpp
namespace so8746025 {

  class DoOperation
  {
  public:
    virtual float Run(const float a, const float b) = 0;
  };

  class OperationAdd : public DoOperation
  {
  public:
    float Run(const float a, const float b)
    {
      return a + b;
    }
  };

  class OperationAddOutOfLine: public DoOperation
  {
  public:
    float Run(const float a, const float b);
  };

  float OperationAddOutOfLine::Run(const float a, const float b)
  {
    return a + b;
  }

  DoOperation& GetInline() {
    static OperationAdd O;
    return O;
  }

  DoOperation& GetOutOfLine() {
    static OperationAddOutOfLine O;
    return O;
  }

} // namespace so8746025

// test.cpp
#include <iostream>

#include <boost/timer.hpp>

namespace so8746025 {

  // Policy version
  template < typename operation_policy>
  struct DoOperationPolicy
  {
    float Run(const float a, const float b)
    {
      return operation_policy::Operation(a,b);
    }
  };

  struct OperationPolicy_Add
  {
    static float Operation(const float a, const float b)
    {
      return a + b;
    }
  };

  // Polymorphic version
  class DoOperation
  {
  public:
    virtual float Run(const float a, const float b) = 0;
  };

  class OperationAdd : public DoOperation
  {
  public:
    float Run(const float a, const float b)
    {
      return a + b;
    }
  };

  class OperationAddOutOfLine: public DoOperation
  {
  public:
    float Run(const float a, const float b);
  };


  DoOperation& GetInline();
  DoOperation& GetOutOfLine();

} // namespace so8746025

using namespace so8746025;

int main()
{
  unsigned int numberOfIterations = 1e8;

  DoOperationPolicy<OperationPolicy_Add> policy;

  OperationAdd stackInline;
  DoOperation& virtualInline = GetInline();

  OperationAddOutOfLine stackOutOfLine;
  DoOperation& virtualOutOfLine = GetOutOfLine();


  boost::timer timer;

  float result = 0;

  for(unsigned int i = 0; i < numberOfIterations; ++i)  {
    result += policy.Run(1,2);
  }
  std::cout << "Policy: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;


  timer.restart();
  result = 0;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
  {
    result += stackInline.Run(1,2);
  }
  std::cout << "Stack Inline: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;

  timer.restart();
  result = 0;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
  {
    result += virtualInline.Run(1,2);
  }
  std::cout << "Virtual Inline: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;

  timer.restart();
  result = 0;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
  {
    result += stackOutOfLine.Run(1,2);
  }
  std::cout << "Stack Out Of Line: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;

  timer.restart();
  result = 0;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
  {
    result += virtualOutOfLine.Run(1,2);
  }
  std::cout << "Virtual Out Of Line: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;

}

我们得到：

$ gcc --version
gcc (GCC) 4.3.2

$ ./testR
Policy: 0.17 seconds (6.71089e+07)
Stack Inline: 0.17 seconds (6.71089e+07)
Virtual Inline: 0.52 seconds (6.71089e+07)
Stack Out Of Line: 0.6 seconds (6.71089e+07)
Virtual Out Of Line: 0.59 seconds (6.71089e+07)

请注意，虚拟化+内联与缺乏虚拟化之间存在细微差别。

Answer 2

FWIW我做到了

政策，而不是mixn
返回值
使用volatile来避免优化循环和循环的无关优化（例如，减少因循环展开而导致的加载/存储以及支持它的目标上的矢量化）。
与直接的静态函数调用进行比较
使用更多迭代方式
在gcc

时间是：

DoDirect: 3.4 seconds.
Policy: 3.41 seconds.
Polymorphic: 3.4 seconds.

Ergo：没有区别。主要是因为GCC能够静态分析DoOperation *的类型为DoOperationAdd - 循环内部有vtable查找：）

重要

如果您希望对此精确循环的 reallife 性能进行基准测试，而不是函数调用开销，请删除volatile。时间现在变成了

DoDirect: 6.71089e+07 in 1.12 seconds.
Policy: 6.71089e+07 in 1.15 seconds.
Polymorphic: 6.71089e+07 in 3.38 seconds.

如您所见，没有volatile，编译器可以优化一些加载 - 存储周期;我假设它可能正在进行循环展开+寄存器分配（但是我没有检查过机器代码）。关键是，循环作为一个整体可以使用'策略'方法比使用动态分派（即虚拟方法）更优化

CODE

#include <iostream>

#include <boost/timer.hpp>

// Direct version
struct DoDirect {
    static float Run(const float a, const float b) { return a + b; }
};

// Policy version
template <typename operation_policy>
struct DoOperationPolicy {
    float Run(const float a, const float b) const {
        return operation_policy::Operation(a,b);
    }
};

struct OperationPolicy_Add {
    static float Operation(const float a, const float b) {
        return a + b;
    }
};

// Polymorphic version
struct DoOperation {
    virtual float Run(const float a, const float b) const = 0;
};

struct OperationAdd  : public DoOperation { 
    float Run(const float a, const float b) const { return a + b; } 
};

int main(int argc, const char *argv[])
{
    boost::timer timer;

    const unsigned long numberOfIterations = 1<<30ul;

    volatile float result = 0;
    for(unsigned long i = 0; i < numberOfIterations; ++i) {
        result += DoDirect::Run(1,2);
    }
    std::cout << "DoDirect: " << result << " in " << timer.elapsed() << " seconds." << std::endl;
    timer.restart();

    DoOperationPolicy<OperationPolicy_Add> policy_operation;
    for(unsigned long i = 0; i < numberOfIterations; ++i) {
        result += policy_operation.Run(1,2);
    }
    std::cout << "Policy: " << result << " in " << timer.elapsed() << " seconds." << std::endl;
    timer.restart();

    result = 0;
    DoOperation* polymorphic_operation = new OperationAdd;
    for(unsigned long i = 0; i < numberOfIterations; ++i) {
        result += polymorphic_operation->Run(1,2);
    }
    std::cout << "Polymorphic: " << result << " in " << timer.elapsed() << " seconds." << std::endl;

}

Answer 3

开启优化。基于策略的变体从中获益很高，因为大多数中间步骤都已完全优化，而多态版本不能跳过，例如对象的解除引用。

Answer 4

您必须启用优化，并确保

两个代码部分实际上都做同样的事情（他们目前没有，你的策略变体不会返回结果）
结果用于某些东西，因此编译器不会完全丢弃代码路径（只需将结果相加并在某处打印就足够了）

Answer 5

我必须更改您的政策代码才能返回计算值：

float Run(const float a, const float b)
{
  return Operation(a,b);
}

其次，我必须存储返回的值，以确保不会优化循环：

int main()
{
  unsigned int numberOfIterations = 1e9;
  float answer = 0.0;

  boost::timer timer;
  DoOperationPolicy<OperationPolicy_Add> policy_operation;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
  {
    answer += policy_operation.Run(1,2);
  }
  std::cout << "Policy got " << answer << " in " << timer.elapsed() << " seconds" << std::endl;

  answer = 0.0;
  timer.restart();
  DoOperation* polymorphic_operation = new OperationAdd;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
  {
    answer += polymorphic_operation->Run(1,2);
  }
  std::cout << "Polymo got " << answer << " in " << timer.elapsed() << " seconds" << std::endl;

  return 0;
}

没有对g ++ 4.1.2进行优化：

Policy got 6.71089e+07 in 13.75 seconds
Polymo got 6.71089e+07 in 7.52 seconds

在g ++ 4.1.2上使用-O3：

Policy got 6.71089e+07 in 1.18 seconds
Polymo got 6.71089e+07 in 3.23 seconds

因此，一旦启用优化，策略肯定会更快。

政策与多态速度

5 个答案:

重要

CODE