Question

我听到很多人在这里说C ++在所有方面都比C更快或更快，但更清洁，更好。

虽然我并不反对C ++非常优雅且非常快的事实，但我没有找到关键内存访问或处理器绑定应用程序的替代品。

问题：就性能而言，C风格的数组在C ++中是否有等价的？

以下示例是设计的，但我对实际问题的解决方案感兴趣：我开发图像处理应用程序，并且像素处理的数量巨大。

double t;

// C++ 
std::vector<int> v;
v.resize(1000000,1);
int i, j, count = 0, size = v.size();

t = (double)getTickCount();

for(j=0;j<1000;j++)
{
    count = 0;
    for(i=0;i<size;i++)
         count += v[i];     
}

t = ((double)getTickCount() - t)/getTickFrequency();
std::cout << "(C++) For loop time [s]: " << t/1.0 << std::endl;
std::cout << count << std::endl;

// C-style

#define ARR_SIZE 1000000

int* arr = (int*)malloc( ARR_SIZE * sizeof(int) );

int ci, cj, ccount = 0, csize = ARR_SIZE;

for(ci=0;ci<csize;ci++)
    arr[ci] = 1;

t = (double)getTickCount();

for(cj=0;cj<1000;cj++)
{
    ccount = 0;
    for(ci=0;ci<csize;ci++)
        ccount += arr[ci];      
}

free(arr);

t = ((double)getTickCount() - t)/getTickFrequency();
std::cout << "(C) For loop time [s]: " << t/1.0 << std::endl;
std::cout << ccount << std::endl;

结果如下：

(C++) For loop time [s]: 0.329069

(C) For loop time [s]: 0.229961

注意：getTickCount()来自第三方lib。如果您想测试，只需用您喜欢的时钟测量替换

更新

我正在使用VS 2010，发布模式，其他一切默认

Answer 1

简单回答：您的基准存在缺陷。

更长的答案：您需要启用完全优化才能获得C ++性能优势。然而，你的基准仍然存在缺陷。

一些观察结果：

如果启用完全优化，将删除一大块for循环。这使你的基准毫无意义。
std::vector有动态重新分配的开销，请尝试std::array。具体来说，microsoft的stl默认为checked iterator。
您没有任何障碍来阻止C / C ++代码/基准代码之间的交叉重新排序。
（并非真正相关）cout << ccount可识别区域设置，printf不是; std::endl刷新输出，printf("\n")不。

显示c ++优势的“传统”代码是C qsort() vs C ++ std::sort()。这是内联代码闪耀的地方。

如果你想要一些“真实的”应用程序示例。搜索一些光线跟踪器或矩阵乘法的东西。选择一个执行自动矢量化的编译器。

<强>更新使用LLVM online demo，我们可以看到整个循环被重新排序。基准代码移动到开始，它跳转到第一个循环中的循环结束点，以便更好地进行分支预测：

（这是c ++代码）

######### jump to the loop end
    jg  .LBB0_11
.LBB0_3:                                # %..split_crit_edge
.Ltmp2:
# print the benchmark result
    movl    $0, 12(%esp)
    movl    $25, 8(%esp)
    movl    $.L.str, 4(%esp)
    movl    std::cout, (%esp)
    calll   std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
.Ltmp3:
# BB#4:                                 # %_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc.exit
.Ltmp4:
    movl    std::cout, (%esp)
    calll   std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)
.Ltmp5:
# BB#5:                                 # %_ZNSolsEd.exit
    movl    %eax, %ecx
    movl    %ecx, 28(%esp)          # 4-byte Spill
    movl    (%ecx), %eax
    movl    -24(%eax), %eax
    movl    240(%eax,%ecx), %ebp
    testl   %ebp, %ebp
    jne .LBB0_7
# BB#6:
.Ltmp52:
    calll   std::__throw_bad_cast()
.Ltmp53:
.LBB0_7:                                # %.noexc41
    cmpb    $0, 28(%ebp)
    je  .LBB0_15
# BB#8:
    movb    39(%ebp), %al
    jmp .LBB0_21
    .align  16, 0x90
.LBB0_9:                                #   Parent Loop BB0_11 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
    addl    (%edi,%edx,4), %ebx
    addl    $1, %edx
    adcl    $0, %esi
    cmpl    %ecx, %edx
    jne .LBB0_9
# BB#10:                                #   in Loop: Header=BB0_11 Depth=1
    incl    %eax
    cmpl    $1000, %eax             # imm = 0x3E8
######### jump back to the print benchmark code
    je  .LBB0_3

我的测试代码：

std::vector<int> v;
v.resize(1000000,1);
int i, j, count = 0, size = v.size();

for(j=0;j<1000;j++)
{
    count = 0;
    for(i=0;i<size;i++)
         count += v[i];     
}

std::cout << "(C++) For loop time [s]: " << t/1.0 << std::endl;
std::cout << count << std::endl;

Answer 2

问题：在性能方面，C风格的数组在C ++中是否有等价的？

答案：编写C ++代码！了解您的语言，了解您的标准库并使用它。标准算法是正确的，可读的和快速的（他们知道如何在当前编译器上快速实现它）。

void testC()
{
    // unchanged
}

void testCpp()
{
    // unchanged initialization

    for(j=0;j<1000;j++)
    {
        // how a C++ programmer accumulates:
        count = std::accumulate(begin(v), end(v), 0);    
    }

    // unchanged output
}

int main()
{
    testC();
    testCpp();
}

输出：

(C) For loop time [ms]: 434.373
1000000
(C++) For loop time [ms]: 419.79
1000000

在Ubuntu上使用g++ -O3 -std=c++0x版本4.6.3编译。

对于您的代码，我的输出与您的类似。 user1202136给出了差异的好答案......

Answer 3

这似乎是编译器问题。对于C阵列，编译器检测模式，使用自动向量化并发出SSE指令。对于载体，它似乎缺乏必要的智能。

如果我强制编译器不使用SSE，结果非常相似（使用g++ -mno-mmx -mno-sse -msoft-float -O3测试）：

(C++) For loop time [us]: 604610
1000000
(C) For loop time [us]: 601493
1000000

以下是生成此输出的代码。它基本上是你问题中的代码，但没有任何浮点。

#include <iostream>
#include <vector>
#include <sys/time.h>

using namespace std;

long getTickCount()
{
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec * 1000000 + tv.tv_usec;
}

int main() {
long t;

// C++ 
std::vector<int> v;
v.resize(1000000,1);
int i, j, count = 0, size = v.size();

t = getTickCount();

for(j=0;j<1000;j++)
{
    count = 0;
    for(i=0;i<size;i++)
         count += v[i];     
}

t = getTickCount() - t;
std::cout << "(C++) For loop time [us]: " << t << std::endl;
std::cout << count << std::endl;

// C-style

#define ARR_SIZE 1000000

int* arr = new int[ARR_SIZE];

int ci, cj, ccount = 0, csize = ARR_SIZE;

for(ci=0;ci<csize;ci++)
    arr[ci] = 1;

t = getTickCount();

for(cj=0;cj<1000;cj++)
{
    ccount = 0;
    for(ci=0;ci<csize;ci++)
        ccount += arr[ci];      
}

delete arr;

t = getTickCount() - t;
std::cout << "(C) For loop time [us]: " << t << std::endl;
std::cout << ccount << std::endl;
}

Answer 4

动态大小的数组的C ++等效项为std::vector。固定大小数组的C ++等价物将是std::array或std::tr1::array pre-C ++ 11。

如果您的矢量代码没有重复大小，很难看出它如何比使用动态分配的C数组慢得多，前提是您在编译时启用了一些优化。

注意：运行已发布的代码，在x86上的gcc 4.4.3上编译，编译器选项

g ++ -Wall -Wextra -pedantic-errors -O2 -std = c ++ 0x

结果重复接近

（C ++）对于循环时间[us]：507888

百万

（C）对于循环时间[us]：496659

百万

因此，经过少量试验后，std::vector变种似乎慢了约2％。我会考虑这种兼容性能。

Answer 5

你指出的是访问对象总是带来一点开销，因此访问vector并不比访问一个好的旧数组快。

但即使使用数组是“C-stylish”，它仍然是C ++所以它不会成为问题。

然后，正如@juanchopanza所说，C ++ 11中有std::array，它可能比std::vector更有效，但专门用于固定大小的数组。

Answer 6

通常编译器完成所有优化......你只需要选择一个好的编译器

C ++等效的C风格数组

6 个答案: