Question

我需要通过设计一些实验来比较各种pthread结构（如互斥体，信号量，读写锁以及相应的串行程序）的性能。主要问题是确定如何测量分析代码的执行时间？

我已经阅读了一些C函数，例如clock（），gettimeofday（）等。据我所知-我们可以使用clock（）来获取程序使用的实际CPU周期数（减去由返回的值）该函数位于我们要测量其时间的代码开头和结尾处），gettimeofday（）返回执行程序的时间。

但是问题是，总的CPU周期对我而言似乎不是一个好的标准，因为它会求和所有并行运行线程上花费的CPU时间（因此，我认为clock（）不好）。另外，由于在后台可能还会运行其他进程，因此挂钟时间也不好，所以时间最终取决于线程的调度方式（因此，根据我的说法，gettimeofday（）也不理想）。

我知道的其他一些功能也可能与上述两个功能相同。因此，我想知道是否可以使用某些函数进行分析，或者我在上述结论中某处是否错了？

Answer 1

我不确定对一个数组求和是否是一个很好的测试，您不需要任何互斥体就可以对多线程中的一个数组求和，每个线程只需要对数组的专用部分求和，并且有很多内存访问很少进行CPU计算。示例（编译时给出SZ和NTHREADS的值），测量的时间为实时（单调）：

#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>

static int Arr[SZ];

void * thSum(void * a)
{
  int s = 0, i;
  int sup = *((int *) a) + SZ/NTHREADS;

  for (i = *((int *) a); i != sup; ++i)
    s += Arr[i];

  *((int *) a) = s;
}

int main()
{
  int i;

  for (i = 0; i != SZ; ++i)
    Arr[i] = rand();

  struct timespec t0, t1;

  clock_gettime(CLOCK_MONOTONIC, &t0);

  int s = 0;

  for (i = 0; i != SZ; ++i)
    s += Arr[i];

  clock_gettime(CLOCK_MONOTONIC, &t1);
  printf("mono thread : %d %lf\n", s,
         (t1.tv_sec - t0.tv_sec) + (t1.tv_nsec - t0.tv_nsec)/1000000000.0);

  clock_gettime(CLOCK_MONOTONIC, &t0);

  int n[NTHREADS];
  pthread_t ths[NTHREADS];

  for (i = 0; i != NTHREADS; ++i) {
    n[i] = SZ / NTHREADS * i;
    if (pthread_create(&ths[i], NULL, thSum, &n[i])) {
      printf("cannot create thread %d\n", i);
      return -1;
    }
  }

  int s2 = 0;

  for (i = 0; i != NTHREADS; ++i) {
    pthread_join(ths[i], NULL);
    s2 += n[i];
  }

  clock_gettime(CLOCK_MONOTONIC, &t1);
  printf("%d threads : %d %lf\n", NTHREADS, s2,
         (t1.tv_sec - t0.tv_sec) + (t1.tv_nsec - t0.tv_nsec)/1000000000.0);
}

编译和执行：

（100.000.000个元素的数组）

/tmp % gcc -DSZ=100000000 -DNTHREADS=2 -O3 s.c -lpthread -lrt
/tmp % ./a.out
mono thread : 563608529 0.035217
2 threads : 563608529 0.020407
/tmp % ./a.out
mono thread : 563608529 0.034991
2 threads : 563608529 0.022659
/tmp % gcc -DSZ=100000000 -DNTHREADS=4 -O3 s.c -lpthread -lrt
/tmp % ./a.out
mono thread : 563608529 0.035212
4 threads : 563608529 0.014234
/tmp % ./a.out
mono thread : 563608529 0.035184
4 threads : 563608529 0.014163
/tmp % gcc -DSZ=100000000 -DNTHREADS=8 -O3 s.c -lpthread -lrt
/tmp % ./a.out
mono thread : 563608529 0.035229
8 threads : 563608529 0.014971
/tmp % ./a.out
mono thread : 563608529 0.035142
8 threads : 563608529 0.016248

（1000.000.000个元素的数组）

/tmp % gcc -DSZ=1000000000 -DNTHREADS=2 -O3 s.c -lpthread -lrt
/tmp % ./a.out
mono thread : -1471389927 0.343761
2 threads : -1471389927 0.197303
/tmp % ./a.out
mono thread : -1471389927 0.346682
2 threads : -1471389927 0.197669
/tmp % gcc -DSZ=1000000000 -DNTHREADS=4 -O3 s.c -lpthread -lrt
/tmp % ./a.out
mono thread : -1471389927 0.346859
4 threads : -1471389927 0.130639
/tmp % ./a.out
mono thread : -1471389927 0.346506
4 threads : -1471389927 0.130751
/tmp % gcc -DSZ=1000000000 -DNTHREADS=8 -O3 s.c -lpthread -lrt
/tmp % ./a.out
mono thread : -1471389927 0.346954
8 threads : -1471389927 0.123572
/tmp % ./a.out
mono thread : -1471389927 0.349652
8 threads : -1471389927 0.127059

您可以看到，即使执行时间不除以线程数，瓶颈也可能是对内存的访问

Answer 2

来自linux clock_gettime：

   CLOCK_PROCESS_CPUTIME_ID (since Linux 2.6.12)
          Per-process CPU-time clock (measures CPU time consumed by all
          threads in the process).

   CLOCK_THREAD_CPUTIME_ID (since Linux 2.6.12)
          Thread-specific CPU-time clock.

我相信clock()在某处实现为clock_gettime(CLOCK_PROCESS_CPUTIME_ID，但我看到它是在glibc中使用times()实现的。

因此，如果要测量特定于线程的CPU时间，可以在GNU / Linux系统上使用clock_gettimer(CLOCK_THREAD_CPUTIME_ID, ...。

请勿使用gettimeofday或clock_gettime(CLOCK_REALTIME来衡量程序的执行情况。甚至不用考虑。 gettimeofday是“壁钟”-您可以将其显示在房间的墙上。如果要测量时间流逝，请忘记gettimeofday。

如果需要，您甚至还可以保持完全的兼容性，方法是在线程内部使用pthread_getcpuclockid并将其返回的clock_id值与clock_gettime一起使用。

比较各种pthread构造的性能

2 个答案: