Question

我有一个计算内积的通用函数：

template < typename _A >
    inline typename _A::value_type innerProduct( const          _A&             A,
                                                 const typename _A::index_type& row0,
                                                 const typename _A::index_type& row1,
                                                 const typename _A::index_type& col,
                                                 const typename _A::size_type&  n )
    {
        typedef typename _A::value_type value_type;
        typedef typename _A::index_type index_type;

        value_type sum = value_type( );

        for( index_type i = 0; i < n; ++i )
        {
            sum += A( row0, i ) * A( row1, i );
        }

        return sum;
}

在计算cholesky分解时，我多次调用该函数。因为函数调用的开销很大（~11％！），所以应该避免。在我的简单世界中，我认为函数非常小，编译器会将其内联。我多次检查编译器选项，但我认为它们没问题。我使用例如/ Ox / O2 / Ob2 / GL。我还检查了函数在调用函数之前对编译器可见。但功能永远不会内联。唯一可行的选择是使用关键字__forceinline定义函数explicit。

那么我有什么样的选择可以告诉编译器内联函数？如果内联函数是否内联，编译器的标准是什么？

Answer 1

Microsoft未记录inline关键字时，可能无法内联函数的具体原因。（我能找到的最接近的是compiler warning C4710的文档。）inline关键字只是一个提示，编译器使用启发式方法来确定内联是否值得进行优化。在各种情况下，内联会损害性能，例如，如果它会增加套准压力。

您已经发现了此问题的解决方案：使用__forceinline关键字告诉编译器您更了解。要使其在呼叫网站上有条件，请创建innerProduct函数的两个版本，一个版本为__forceinline，另一个版本为__forceinline value_type innerProduct_forceinline(...) { ... return sum; } inline value_type innerProduct(...) { return innerProduct_forceinline(...); }。看起来您可以通过调用前者来实现后者。类似的东西：

{{1}}

Visual Studio 2010 - 功能未内联 - 为什么？

1 个答案: