Question

是否有一些很好的方法可以了解函数在C中的表现？我想比较我自己的函数和库函数。

Answer 1

您需要高分辨率的计时器。

在Linux上，gettimeofday()是一个不错的选择，它可以为您提供微秒分辨率。在Windows上，QueryPerformanceCounter()是典型的。确保多次运行您的功能，以获得稳定的读数。

快速示例，适用于Linux：

struct timeval t0, t1;
unsigned int i;

gettimeofday(&t0, NULL);
for(i = 0; i < 100000; i++)
  function_to_measure();
gettimeofday(&t1, NULL);
printf("Did %u calls in %.2g seconds\n", i, t1.tv_sec - t0.tv_sec + 1E-6 * (t1.tv_usec - t0.tv_usec));

您当然会调整计数（100,000）以匹配函数的性能。最好是函数真的需要一段时间才能运行，否则循环和/或函数调用开销可能占主导地位。

Answer 2

开源Callgrind探查器（适用于Linux）是衡量性能的一种非常棒的方法。与KCacheGrind相结合，您可以非常清晰地了解您的时间花在哪里。

Callgrind是Valgrind的一部分。

技术

Answer 3

在输入功能之前存储系统时间。从函数返回后存储系统时间。减去差异并比较两种实现方式。

Answer 4

运行它（它们）数百万次（每次）并测量所需的时间更快完成的是性能更好的。

gprof可以提供帮助：）

当我运行我的程序10秒钟（函数名称已更改）时，这是gprof的结果

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 60.29      8.68     8.68 115471546     0.00     0.00  workalot
 39.22     14.32     5.64       46   122.70   311.32  work_b
  0.49     14.39     0.07                             inlined
  0.07     14.40     0.01       46     0.22     0.22  work_c
  0.00     14.40     0.00      460     0.00     0.00  find_minimum
  0.00     14.40     0.00      460     0.00     0.00  feedback
  0.00     14.40     0.00       46     0.00     0.00  work_a

Answer 5

弗雷德，我注意到你在评论中说你是在OS X上。在OS X上获得非常精确的小规模函数计时的最佳方法是使用mach_absoute_time( )函数。您可以按如下方式使用它：

#include <mach/mach_time.h>
#include <stdint.h>

int loopCount;

uint64_t startTime = mach_absolute_time( );
for (loopCount = 0; loopCount < iterations; ++loopCount) {
    functionBeingTimed( );
}
uint64_t endTime = mach_absolute_time( );
double averageTime = (double)(endTime-startTime) / iterations;

这为您提供了对函数iterations次调用的平均时间。这可能会受到系统上进程外部影响的影响。因此，您可能希望采取最快的时间：

#include <mach/mach_time.h>
#include <stdint.h>

int loopCount;

double bestTime = __builtin_inf();
for (loopCount = 0; loopCount < iterations; ++loopCount) {
    uint64_t startTime = mach_absolute_time( );
    functionBeingTimed( );
    uint64_t endTime = mach_absolute_time( );
    double bestTime = __builtin_fmin(bestTime, (double)(endTime-startTime));
}

这可能有自己的问题，特别是如果定时函数非常快。你需要考虑你真正想要测量的东西，并选择一种科学合理的方法（良好的实验设计 hard ）。我经常使用这两种方法之间的混合作为测量新任务的第一次尝试（对许多呼叫的平均值最小）。

另请注意，在上面的代码示例中，时间以“mach time units”为单位。如果你只想比较算法，这通常很好。出于某些其他目的，您可能希望将它们转换为纳秒或周期。为此，您可以使用以下功能：

#include <mach/mach_time.h>
#include <sys/sysctl.h>
#include <stdint.h>

double ticksToNanoseconds(double ticks) {
    static double nanosecondsPerTick = 0.0;
    // The first time the function is called
    // ask the system how to convert mach
    // time units to nanoseconds
    if (0.0 == nanosecondsPerTick) {
        mach_timebase_info_data_t timebase;
        // to be completely pedantic, check the return code of this call:
        mach_timebase_info(&timebase);
        nanosecondsPerTick = (double)timebase.numer / timebase.denom;
    }
    return ticks * nanosecondsPerTick;
}

double nanosecondsToCycles(double nanoseconds) {
    static double cyclesPerNanosecond = 0.0;
    // The first time the function is called
    // ask the system what the CPU frequency is
    if (0.0 == cyclesPerNanosecond) {
        uint64_t freq;
        size_t freqSize = sizeof(freq);
        // Again, check the return code for correctness =)
        sysctlbyname("hw.cpufrequency", &freq, &freqSize, NULL, 0L );
        cyclesPerNanosecond = (double)freq * 1e-9;
    }
    return nanoseconds * cyclesPerNanosecond;
}

请注意，转换为纳秒将始终是合理的，但转换为周期可能会以各种方式出错，因为现代CPU不会以固定的速度运行。尽管如此，它通常运作良好。

Answer 6

所有这些其他答案都使用gettimeofday()的某些变体进行计时。这非常粗糙，因为您通常需要多次运行内核才能获得可重现的结果。将其置于紧密循环中会改变代码和数据缓存的状态，因此这些结果可能无法表明实际性能。

更好的选择是实际使用CPU循环计数器。在x86上，您可以使用rdtsc指令执行此操作。这来自x264：

static inline uint32_t read_time(void)
{
    uint32_t a = 0;
#if defined(__GNUC__) && (defined(ARCH_X86) || defined(ARCH_X86_64))
    asm volatile( "rdtsc" :"=a"(a) ::"edx" );
#elif defined(ARCH_PPC)
    asm volatile( "mftb %0" : "=r" (a) );
#elif defined(ARCH_ARM)     // ARMv7 only
    asm volatile( "mrc p15, 0, %0, c9, c13, 0" : "=r"(a) );
#endif
    return a;
}

有关使用各种硬件计数器进行性能分析的更多信息，请参阅PAPI。出于某些目的，模拟器（如Callgrind和基于中断的分析器（Oprofile）非常有用。

Answer 7

你好我会给你一个例子并解释一下：

#include <stdio.h>
#include <time.h>

int main(void)
{

    clock_t start_clk = clock();

    /*
        put any code here
    */

    printf("Processor time used by program: %lg sec.\n", \
    (clock() - start_clk) / (long double) CLOCKS_PER_SEC);

    return 0;
}

输出：程序使用的处理器时间：4.94066e-324秒。

time.h中：

声明clock_t这是一个算术（你可以像我在例子中那样对这个值进行数学运算）时间值。基本上把任何代码放在评论所在的位置。

CLOCKS_PER_SEC是在time.h中声明的宏，使用它作为分母将值转换为秒。

由于两个原因，必须将其强制转换为双倍：

我们不知道clock_t实际上是什么类型，但是我们想打印它（你会在printf中放什么转换？）。
long double是一种非常精确的类型，它可以代表非常小的值。

Answer 8

结帐HighResTimer以获得高性能计时器。

您可能会发现存储前/后时间不够准确，除非您有更长的运行功能，否则可能会导致0。

Answer 9

查看RDTSC，但最好如下所示。

0 - 调用系统的睡眠或收益功能，以便在它返回时，你有一个新的时间片

1 - RDTSC

2 - 调用你的功能

3 - RDTSC

如果您的功能是长期运行的，您必须使用某种分析工具，如gprof（它非常易于使用）＆amp;英特尔的VTune应用程序（我很久没有使用过）。在看到Art的答案之后，我将我的想法从gprof改为Callgrind。我过去只使用了Valgrind的Memcheck工具，它是一个很棒的工具。我之前没有使用过Callgrind，但我相信它比gprof更好......

Answer 10

在输入功能

退出功能

比较时间戳

确保使用重要样本，因为时间分辨率可能会改变您的结果。对于短期功能尤其如此。使用高分辨率计时器（大多数平台都可以使用微秒级分辨率）。

Answer 11

作为最简单和便携的方法，您可以使用标准函数time（），它返回自Epoch以来的当前秒数。


#include <time.h>

time_t starttime, endtime;

starttime = time(NULL);
for (i = 0; i < 1000000; i++)
{
    testfunc();
}
endtime = time(NULL);

printf("Time in seconds is %d\n", (int)(endtime-starttime));

根据需要调整迭代次数。如果一个函数调用需要5秒钟，那么你需要一杯咖啡进行100万次迭代...当差异小于1秒时，即使是大量的，你应该1）问问自己是否重要，如果是，2检查您最喜欢的编译器是否已经内置了分析功能。

如何测试C函数的性能？

11 个答案: