Question

我使用cachegrind在Linux上分析了一个计算量很大的C ++程序。令人惊讶的是，事实证明我的程序的瓶颈不是任何排序或计算方法......它是在阅读输入。

以下是cachegrind的屏幕截图，以防我错误地解释了探查器结果（参见 scanf()）：

Profiler Results

我希望我说scanf()占用了80.92％的跑步时间。

我使用cin >> int_variable_here读取输入，如下所示：

std::ios_base::sync_with_stdio (false); // Supposedly makes I/O faster
cin >> NumberOfCities;
cin >> NumberOfOldRoads;
Roads = new Road[NumberOfOldRoads];

for (int i = 0; i < NumberOfOldRoads; i++)
{
    int cityA, cityB, length;    

    cin >> cityA;
    //scanf("%d", &cityA);    // scanf() and cin are both too slow
    cin >> cityB;
    //scanf("%d", &cityB);
    cin >> length;
    //scanf("%d", &length);

    Roads[i] = Road(cityA, cityB, length);
}

如果您没有发现此输入阅读代码的任何问题，您能否建议更快的方式来阅读输入？我正在考虑尝试getline()（在我等待回复的同时继续努力）。我的猜测是getline（）可能运行得更快，因为它必须做更少的转换，并且它解析流的次数较少（只是我的猜测，尽管我最终也必须将字符串解析为整数）。

我的意思是“太慢”，这是一个较大的家庭作业的一部分，在一段时间后会超时（我相信它是90秒）。我非常有信心瓶颈在这里，因为我特意评论了我程序其余部分的主要部分，但它仍然超时。我不知道教师在我的程序中运行的测试用例，但它必须是一个巨大的输入文件。那么，我可以用什么来最快地读取输入？

输入格式是严格的：对于许多行，每行分隔一个空格的3个整数：

示例输入：

我需要从每行的整数中生成Road。

另请注意，输入被重定向到我的程序到标准输入（myprogram < whatever_test_case.txt）。我不是在读特定的文件。我刚读完cin。

更新

使用Slava's方法：

输入读数似乎花费的时间较少，但仍然超时（可能不再是由于输入读数）。 Slava的方法在Road() ctor（main下2）中实现。所以现在需要22％的时间而不是80％的时间。我正在考虑优化SortRoadsComparator()，因为它被称为26,000,000次。

enter image description here

比较代码：

// The complexity is sort of required for the whole min() max(), based off assignment instructions
bool SortRoadsComparator(const Road& a, const Road& b)
{
    if (a.Length > b.Length) 
        return false;

    else if (b.Length > a.Length) 
        return true;

    else
    {
        // Non-determinism case
        return ( (min(a.CityA, a.CityB) < min(b.CityA, b.CityB)) ||
            (
            (min(a.CityA, a.CityB) == min(b.CityA, b.CityB)) && max(a.CityA, a.CityB) < max(b.CityA, b.CityB)                                     
            )
            );
    }
}

使用enhzflep's方法

enter image description here

考虑解决了

我将考虑解决这个问题，因为瓶颈不再是读取输入。 Slava的方法对我来说是最快的。

Answer 1

Streams非常知道非常慢。这不是一个大惊喜 - 他们需要处理本地化，条件等。一个可能的解决方案是通过std :: getline（std ::: cin，str）逐行读取文件并通过类似的方式将字符串转换为数字这样：

std::vector<int> getNumbers( const std::string &str )
{
   std::vector<int> res;
   int value = 0;
   bool gotValue = false;
   for( int i = 0; i < str.length(); ++i ) {
      if( str[i] == ' ' ) {
         if( gotValue ) res.push_back( value );
         value = 0;
         gotValue = false;
         continue;
      }
      value = value * 10 + str[i] - '0';
      gotValue = true;
   }
   if( gotValue ) res.push_back( value );
   return res;
}

我没有测试这段代码，写下来表明这个想法。我假设你不希望在输入中得到任何东西，但空格和数字，所以它不验证输入。

首先要优化排序，您应该检查是否确实需要对整个序列进行排序。对于比较器，我会编写方法getMin（）getMax（）并将这些值存储在对象中（不是一直计算它们）：

bool SortRoadsComparator(const Road& a, const Road& b)
{
    if( a.Length != b.Length ) return a.Length < b.length;
    if( a.getMin() != b.getMin() ) return a.getMin() < b.getMin();
    return a.getMax() < b.getMax();
}

如果我理解当前比较器如何正常工作。

Answer 2

正如Slava所说，溪流（即cin）在性能（和可执行文件大小）方面都是绝对的猪只

考虑以下两种方法：

start = clock();
std::ios_base::sync_with_stdio (false); // Supposedly makes I/O faster
cin >> NumberOfCities >> NumberOfOldRoads;
Roads = new Road[NumberOfOldRoads];
for (int i = 0; i < NumberOfOldRoads; i++)
{
    int cityA, cityB, length;
    cin >> cityA >> cityB >> length;
    Roads[i] = Road(cityA, cityB, length);
}
stop = clock();
printf ("time: %d\n", stop-start);

和

start = clock();
fp = stdin;
fscanf(fp, "%d\n%d\n", &NumberOfCities, &NumberOfOldRoads);
Roads = new Road[NumberOfOldRoads];
for (int i = 0; i < NumberOfOldRoads; i++)
{
    int cityA, cityB, length;
    fscanf(fp, "%d %d %d\n", &cityA, &cityB, &length);
    Roads[i] = Road(cityA, cityB, length);
}
stop = clock();
printf ("time: %d\n", stop-start);

每次运行5次（输入文件为1,000,000个条目+前2个'对照'行）可以得到以下结果：

使用cin ，不用 指示与stdio不同步 8291,8501,8720,8918,7164（平均8318.3）
使用cin 指示不与stdio同步 4875,4674,4921,4782,5171（avg 4884.6）
使用fscanf 1681,1676,1536,1644,1675（平均1642.4）

所以，显然，可以看到sync_with_stdio（false）方向确实有帮助。人们还可以看到fscanf用cin击败了每一个方法。事实上，fscanf方法几乎 快3倍 比cin方法更好，而且 快5倍 cin。

Answer 3

inline void S( int x ) {
x=0;
while((ch<'0' || ch>'9') && ch!='-' && ch!=EOF) ch=getchar_unlocked();
if (ch=='-')
sign=-1 , ch=getchar_unlocked();
    else
sign=1;
do
x = (x<<3) + (x<<1) + ch-'0';
while((ch=getchar_unlocked())>='0' && ch<='9');
x*=sign;
}

您可以将此功能用于任何类型的数字输入，只需更改参数类型即可。这将比std scanf运行得快得多。

如果你想节省更多时间，最好的事情就是使用fread（）和fwrite（），但在这种情况下你必须自己操纵输入。为了节省时间，您应该使用fread（）在一次调用中从标准输入流中读取大量数据。这将减少I / O调用的数量，因此您将看到很大的时间差异。

用于读取标准输入的C ++最快的cin？

更新

考虑解决了

3 个答案: