Question

我最近发现自己受到性能的影响，因为我声明了一个默认的构造函数，如：

Foo() = default;

代替

Foo() {}

（仅供参考，我需要明确声明它，因为我还有一个可变参数构造函数，否则它将覆盖默认构造函数）

这对我来说似乎很奇怪，因为我认为这两行代码是相同的（嗯，只要可以使用默认构造函数即可。如果无法使用默认构造函数，第二行代码将产生错误，并且第一个会隐式删除默认构造函数。“不是我的情况！）

好的，所以我做了一个小测试仪，结果根据编译器的不同而有很大的不同，但是在某些设置下，我得到了一致的结果，一个结果要比另一个更快：

#include <chrono>

template <typename T>
double TimeDefaultConstructor (int n_iterations)
{
    auto start_time = std::chrono::system_clock::now();

    for (int i = 0; i < n_iterations; ++i)
        T t;

    auto end_time = std::chrono::system_clock::now();

    std::chrono::duration<double> elapsed_seconds = end_time - start_time;

    return elapsed_seconds.count();
}

template <typename T, typename S>
double CompareDefaultConstructors (int n_comparisons, int n_iterations)
{
    int n_comparisons_with_T_faster = 0;

    for (int i = 0; i < n_comparisons; ++i)
    {
        double time_for_T = TimeDefaultConstructor<T>(n_iterations);
        double time_for_S = TimeDefaultConstructor<S>(n_iterations);

        if (time_for_T < time_for_S)    
            ++n_comparisons_with_T_faster;  
    }

    return (double) n_comparisons_with_T_faster / n_comparisons;
}


#include <vector>

template <typename T>
struct Foo
{
    std::vector<T> data_;

    Foo() = default;
};

template <typename T>
struct Bar
{
    std::vector<T> data_;

    Bar() {};
};

#include <iostream>

int main ()
{
    int n_comparisons = 10000;
    int n_iterations = 10000;

    typedef int T;

    double result = CompareDefaultConstructors<Foo<T>,Bar<T>> (n_comparisons, n_iterations);

    std::cout << "With " << n_comparisons << " comparisons of " << n_iterations
        << " iterations of the default constructor, Foo<" << typeid(T).name() << "> was faster than Bar<" << typeid(T).name() << "> "
        << result*100 << "% of the time" << std::endl;

    std::cout << "swapping orientation:" << std::endl;

    result = CompareDefaultConstructors<Bar<T>,Foo<T>> (n_comparisons, n_iterations);

    std::cout << "With " << n_comparisons << " comparisons of " << n_iterations
        << " iterations of the default constructor, Bar<" << typeid(T).name() << "> was faster than Foo<" << typeid(T).name() << "> "
        << result*100 << "% of the time" << std::endl;

    return 0;
}

将以上程序与g++ -std=c++11配合使用，我始终得到类似于以下内容的输出：

具有10000次迭代的10000次比较默认构造函数，Foo的时间比Bar快4.69％交换方向：通过10000次迭代的10000次比较默认构造函数，Bar比Foo快96.23％时间

更改编译器设置似乎会更改结果，有时会完全将其翻转。但是我不明白的是为什么它如此重要？

Answer 1

此基准无法衡量其应衡量的水平。将Bar() {};替换为Bar() = default;和Foo，将得到相同的结果：

通过对默认构造函数的10000次迭代进行10000次比较，Foo的时间比Bar快69.89％交换方向：通过对默认构造函数的10000次迭代进行10000次比较，Bar比Foo快29.9％的时间

这是一个生动的演示，说明您所衡量的不是构造函数，而是其他。

启用Bar优化后，带有-O1的{{1}}循环会退化为¹：

for

对于T t;和

        test    ebx, ebx
        jle     .L3
        mov     eax, 0
.L4:
        add     eax, 1
        cmp     ebx, eax
        jne     .L4
.L3:

的

。也就是说，进入一个简单的Foo循环。

启用Bar或for (int i = 0; i < n_iterations; ++i);后，它会被完全优化。

未经优化（-O2），您将获得以下程序集：

-O3

与-O0的{{1}}相同，并由mov DWORD PTR [rbp-4], 0 .L35: mov eax, DWORD PTR [rbp-4] cmp eax, DWORD PTR [rbp-68] jge .L34 lea rax, [rbp-64] mov rdi, rax call Foo<int>::Foo() lea rax, [rbp-64] mov rdi, rax call Foo<int>::~Foo() add DWORD PTR [rbp-4], 1 jmp .L35 .L34:替换。

现在让我们看一下构造函数：

Bar

和

Foo

如您所见，它们也是相同的。

¹ GCC 8.3

Answer 2

$url = "https://www.googleapis.com/calendar/v3/calendars/primary/events?maxResults=10&singleEvents=true&orderBy=startTime&timeMin=2019-06-03T10:00:00-07:00";和Foo() = default;不同。前者是琐碎的默认构造函数，而后者是默认构造函数的自定义版本，除了默认内容外什么也不做。

这可以通过type_traits观察到。这种变化可能会影响在模板函数解析中选择的分配/构造例程，从而导致使用完全不同的代码。

尽管这对默认构造函数无关紧要-对于复制构造函数/赋值，它可能会发生很大变化。因此，Foo() {};是首选。

Answer 3

我怀疑您认为您看到的速度差异主要是时间差的副产品，而不是真实的。

为了查看生成的结果，我对您的代码做了一些简化，只剩下以下内容：

#include <vector>

template <typename T>
struct Foo
{
    std::vector<T> data_;

    Foo() = default;
};

template <typename T>
struct Bar
{
    std::vector<T> data_;

    Bar() {};
};

int main() { 
    Foo<int> f;

    Bar<int> b;
}

然后，我将其放在on Godbolt中，以便于查看生成的代码。

gcc 9.2似乎为两个ctor生成了相同的代码，在两种情况下都是这样：

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     QWORD PTR [rbp-8], rdi
mov     rax, QWORD PTR [rbp-8]
mov     rdi, rax
call    std::vector<int, std::allocator<int> >::vector() [complete object constructor]
nop
leave
ret

Clang产生的代码略有不同，但是（再次）对于这两个类而言是相同的：

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     qword ptr [rbp - 8], rdi
mov     rdi, qword ptr [rbp - 8]
call    std::vector<int, std::allocator<int> >::vector() [base object constructor]
add     rsp, 16
pop     rbp
ret

Intel icc几乎相同，为两个类生成此代码：

push      rbp                                           #8.5
mov       rbp, rsp                                      #8.5
sub       rsp, 16                                       #8.5
mov       QWORD PTR [-16+rbp], rdi                      #8.5
mov       rax, QWORD PTR [-16+rbp]                      #8.5
mov       rdi, rax                                      #8.5
call      std::vector<int, std::allocator<int> >::vector() [complete object constructor]                      #8.5
leave                                                   #8.5
ret

虽然我同意其他人的观点，但是在禁用优化的情况下看性能几乎没有效果，在这种情况下，即使禁用优化似乎也无法（至少对于这三个编译器而言）不足以获取用于构造两个类的对象的不同代码。我想如果有一些编译器和/或优化设置会产生不同的结果，我不会感到非常惊讶，但是我恐怕我没有足够的野心花很多时间来寻找它。

Answer 4

ID = df.loc[df['ID'].ne('nan'), ['ID']].astype(str).assign(a=1) df = ID.merge(df.assign(a=1).drop('ID', axis=1), on='a') df['Stat.Suffix'] = df['ID'] + '.' + df['Stat.Suffix'].dropna().astype(int).astype(str) df = df.drop(['ID','a'], axis=1).fillna('')是平凡的构造函数。

Foo() = default;是一个用户定义的构造函数，根据定义，即使它们为空，用户定义的构造函数也绝不简单。

另请参阅：Trivial default constructor和std::is_trivial。

可以预期的是，当启用编译器优化时，琐碎的构造函数可能比用户提供的构造函数更快。

为什么使用默认构造函数“ {}”而不是“ = default”会导致性能变化？

4 个答案: