Question

我有一个c ++代码，它运行大约200个ASCII文件，进行一些基本的数据处理，并输出一个带有（基本上）所有数据的ASCII文件。

程序一开始运行得非常快，然后在一半时间内急剧减速，或许会逐渐减慢，然后以相当缓慢的速度进行其余操作。即它在大约5秒钟内通过前80个文件，在大约50秒内完成约200个文件。每个文件基本相同。

我正在寻找有关如何追踪问题或内存泄漏的建议。

更多细节：起初我会在程序开头fopen（FILE * outputFile，“w”），最后是fclose（）。前~40个文件大约需要4秒钟;然后大约1.5分钟，约200个文件。

我想也许输出文件堵塞了内存，所以我每次迭代都将代码更改为fopen（outputFile，“a”）（即每次我打开一个新文件），每次关闭输入时fclose（）文件......如上所述，这将性能提高到约50秒。

这种“修复”有助于显着提升，但并非完全有用，这似乎很奇怪。

另外，我没有动态分配任何内存（没有调用'new'或'delete'或'free'或者其他什么）....所以我甚至不确定我怎么能拥有< / em>内存泄漏。

任何帮助将不胜感激！谢谢！

代码：

vector<string> dirCon; // Uses boost::filesystem to store every file in directory bool retVal = FileSystem::getDirectoryContents(HOME_DIR+HISTORY_DIR, &dirCon, 2); int counter = 0; for(int i = 0; i < dirCon.size(); i++) { // Create output file FILE *outFile; string outputFileName = HOME_DIR ... ; // open file as append "a" bool ifRet = initFile(outFile, outputFileName.c_str(), "a"); if(!ifRet) { fprintf(stderr, "ERROR ... "); return false; } // Get the topmost directory name size_t loc = dirCon.at(i).find_last_of("/"); string dirName = dirCon.at(i).substr(loc+1, (dirCon.at(i).size()-(loc+1))); // Get the top directory content vector<string> subDirCon; bool subRetVal = FileSystem::getDirectoryContents(dirCon.at(i), &subDirCon); if(!subRetVal) { fprintf(stderr, "ERROR\n"); return false; } // Go through each file in directory, look for the one that matches for(int j = 0; j < subDirCon.size(); j++) { // Get filename loc = subDirCon.at(j).find_last_of("/"); string fileName = subDirCon.at(j).substr(loc+1, (subDirCon.at(j).size()-(loc+1))); // If filename matches desired station, process and store if( fileName == string(dirName ...) ) { // Open File FILE *inFile; if(!initFile(inFile, subDirCon.at(j).c_str(), "r")) { fprintf(stderr, "ERROR: ... !\n"); break; } // Parse file line-by-line char str[TB_CHARLIMIT_LARGE]; const char *delim = ","; while(true) { vector<string> splitString; fgets(str, TB_CHARLIMIT_LARGE, inFile); if(feof(inFile)) { break; } // break at end of file removeEndLine(str); // If non-comment line, parse if(str[0] != COMCHAR){ string strString(str); // remove end line char strString.erase(std::remove(strString.begin(), strString.end(), '\n'), strString.end()); strcpy(str, strString.c_str()); char *temp = strtok(str,delim); char *lastTemp; while(temp != NULL) { splitString.push_back(string(temp)); temp = strtok(NULL,delim); } if(splitString.size() > 0) { DateTime dtTemp(splitString.at(0)); goodLines++; /* ... process splitString, use dtTemp ... */ // Output to file fprintf(outFile, "%s\n", strFromStrVec(splitString).c_str()); } } } //while fclose(inFile); } } //j cout << "GoodLines = " << goodLines << endl; fclose(outFile); } // i bool getDirectoryContents(const string dirName, vector<string> *conts) { path p(dirName); try { // Confirm Exists if(!exists(p)) { fprintf(stderr, "ERROR: '%s' does not exist!\n", dirName.c_str()); return false; } // Confirm Directory if(!is_directory(p)) { return false; } conts->clear(); // Store paths to sort later typedef vector<path> vec; vec v; copy(directory_iterator(p), directory_iterator(), back_inserter(v)); sort(v.begin(), v.end()); for(vec::const_iterator it(v.begin()), it_end(v.end()); it != it_end; ++it) { conts->push_back(it->string()); } } catch(const filesystem_error& ex) { fprintf(stderr, "ERROR: '%s'!\n", ex.what()); return false; } return true; }

Answer 1

如果没有更多信息，我猜你正在处理的是Schlemiel the Painter的算法：(Original) (Wikipedia)。他们非常容易陷入字符串处理。我举个例子。

我想读取文件中的每一行，以某种方式处理每一行，通过一些中间处理运行它。然后我想收集结果，并将其写回磁盘。这是一种方法。我犯了一个很容易错过的巨大错误：

// proc.cpp
class Foo
{
  public:
  std::string chew_on(std::string const& line_to_chew_on) {...}
  ...
};

Foo processor;
std::string buffer;

// Read/process
FILE *input=fopen(..., "r");
char linebuffer[1000+1];
for (char *line=fgets(linebuffer, 1000, input); line; 
     line=fgets(linebuffer, 1000, input) ) 
{
    buffer=buffer+processor.chew_on(line);  //(1)
}
fclose(input);

// Write
FILE *output=fopen(...,"w");
fwrite(buffer.data(), 1, buffer.size(), output);
fclose(output);

乍一看很容易错过的问题是，每次运行时间(1)时，都会复制buffer的全部内容。如果有1000行，每行100个字符，你最终花费时间复制100 + 200 + 300 + 400 + .... + 100,000 = 5,050,000字节副本来运行它。增加到10,000行？ 500500000。油漆可以越走越远。

在这个特定的例子中，修复很容易。第(1)行应为：

    buffer.append(processor.chew_on(line)); // (2)

或等效地:(感谢Matthieu M。）：

    buffer += processor.chew_on(line);

这可以提供帮助，因为（通常）std::string不需要制作buffer的完整副本来执行append功能，而在(1)中，我们我坚持要复制一份。

更一般地说，假设（a）您正在进行的处理保持状态，（b）您经常引用其全部或大部分，以及（c）该状态随时间增长。然后有一个很好的机会，你已经编写了一个Θ（n ²）时间算法，它将准确地说明你正在谈论的行为类型。

<小时/> 修改

当然，股票回答“为什么我的代码慢？”是“运行个人资料”。有很多工具和技术可以做到这一点。一些选项包括：

callgrind / kcachegrind（由David Schwartz建议）

Random Pausing（由Mike Dunlavey建议）

GNU分析器，gprof

GNU测试覆盖率分析器gcov

oprofile

他们都得到了自己的优势。 “随机暂停”可能是最简单的实现，但很难解释结果。 'gprof'和'gcov'在多线程程序上基本没用。 Callgrind很彻底，但速度很慢，有时可以在多线程程序上玩奇怪的技巧。 oprofile很快，可以很好地使用多线程程序，但可能很难使用，而且可能会遗漏。

但是，如果您正在尝试分析单个线程程序，并且正在使用GNU工具链进行开发，那么gprof可能是一个很好的选择。以上是proc.cpp。为了演示，我将描述一个未经优化的运行。首先，我重建我的程序以进行性能分析（将-pg添加到编译和链接步骤）：

$ g++ -O0 -g -pg -o proc.o -c proc.cpp
$ g++ -pg -o proc proc.o

我运行程序一次以创建分析信息：

./proc

除了执行通常执行的操作外，此运行还将在当前目录中创建名为“gmon.out”的文件。现在，我运行gprof来解释结果：

$ gprof ./proc
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
100.50      0.01     0.01   234937     0.00     0.00  std::basic_string<...> std::operator+<...>(...)
  0.00      0.01     0.00   234937     0.00     0.00  Foo::chew_on(std::string const&)
  0.00      0.01     0.00        1     0.00    10.05  do_processing(std::string const&, std::string const&)
...

确实，我的课程时间的100.5％用于std::string operator+。好吧，好吧，直到一些抽样误差。（我在VM中运行它...似乎gprof捕获的时间是关闭的。我的程序运行时间超过0.01累积秒......）

对于我的一个非常简单的例子，gcov不那么有启发性。但这就是它发生的事情。首先，编译并运行gcov：

$ g++ -O0 -fprofile-arcs -ftest-coverage -o proc proc.cpp
$ ./proc
$ gcov ./proc
...

这会在当前目录中创建一系列以.gcno，.gcda，.gcov结尾的文件。 .gcov中的文件告诉我们在运行期间每行代码执行了多少次。因此，在我的示例中，我的proc.cpp.gcov最终看起来像这样：

        -:    0:Source:proc.cpp
        -:    0:Graph:proc.gcno
        -:    0:Data:proc.gcda
        -:    0:Runs:1
        -:    0:Programs:1
        -:    1:#include 
        -:    2:#include 
        -:    4:class Foo
        -:    5:{
        -:    6:  public:
   234937:    7:  std::string chew_on(std::string const& line_to_chew_on) {return line_to_chew_on;}
        -:    8:};
        -:    9:
        -:   10:
        -:   11:
        1:   12:int do_processing(std::string const& infile, std::string const& outfile)
        -:   13:{
        -:   14:  Foo processor;
        2:   15:  std::string buffer;
        -:   16:
        -:   17:  // Read/process
        1:   18:  FILE *input=fopen(infile.c_str(), "r");
        -:   19:  char linebuffer[1000+1];
   234938:   20:  for (char *line=fgets(linebuffer, 1000, input); line; 
        -:   21:       line=fgets(linebuffer, 1000, input) ) 
        -:   22:    {
   234937:   23:      buffer=buffer+processor.chew_on(line);  //(1)
        -:   24:    }
        1:   25:  fclose(input);
        -:   26:
        -:   27:  // Write
        1:   28:  FILE *output=fopen(outfile.c_str(),"w");
        1:   29:  fwrite(buffer.data(), 1, buffer.size(), output);
        1:   30:  fclose(output);
        1:   31:}
        -:   32:
        1:   33:int main()
        -:   34:{
        1:   35:  do_processing("/usr/share/dict/words","outfile");
        -:   36:}

因此，我必须得出结论，第23行的std :: string :: operator +（执行234,937次）是导致程序运行缓慢的原因。

另外，callgrind / kcachegrind使用多线程程序，可以提供更多信息。对于这个程序，我运行：

g++ -O0 -o proc proc.cpp
valgrind --tool=callgrind ./proc  # this takes forever to run
kcachegrind callgrind.out.*

我找到了以下输出，表明真正吃掉我的周期的是很多很多内存副本（占__memcpy_ssse3_back的执行时间的99.4％），我可以看到它们都发生在23行以下在我的来源： kcachegrind screenshot

Answer 2

使用callgrind分析您的代码，这是valgrind套件的一部分。您可以使用kcachegrind以图形方式浏览结果。（尽管它的名字，它也适用于callgrind输出。）它是免费的，会给你很棒的细节。

您还可以在外部打开和关闭数据收集。所以从它开始，等到你的程序变慢，在问题时间打开它，然后关闭它。你会看到CPU的去向。如果有必要的话，只有当它快速并且比较时才能反向观察。

通常，这个问题会像拇指一样突出。

Answer 3

你能分享你的节目吗？

需要注意的一件事是，您是否正在使用不随着元素数量的增加而扩展的数据结构。

e.g。使用列表来容纳一百万个元素，遍历/搜索（O（n））的速度非常慢，而不是使用二叉搜索树（nlog（n））或散列（O（1））。

2。您应该查看是否在每个周期结束时（/ burn / run）保留数据。理想情况下，您应该在每次运行结束时释放所有资源。

3。听起来可能有手柄泄漏？

Answer 4

这是在黑暗中完全拍摄的。你有：

bool getDirectoryContents(const string dirName, vector<string> *conts) {
    ...
    copy(directory_iterator(p), directory_iterator(), back_inserter(v));

如果您改为：

，性能会如何变化

bool getDirectoryContents(const string dirName, vector<string> *conts) {
    ...
    // note: preincrementing the iterator
    for (directory_iterator it((p)); it!=directory_iterator(); ++it) {
       v.push_back(*it);
    }

我的想法是std::copy被指定使用后增量。 boost::filesystem::directory_iterator是 InputIterator ：它不应该真正支持postincrement。 boost::filesystem::directory_iterator可能不乐意后期增加。

内存/速度问题的一般策略

4 个答案: