问题描述

Question

问题描述

下面，我有一个程序正在执行两个简单的加法和乘法运算。然后，我将这两个简单运算的总和存储在两个分别称为total1和total2的变量中。在计算方面，total2将花费更多时间才能完全执行。我实现代码的方式，目前正在对两种数学运算的整个模拟进行计时。

问题

是否可以分别对合计1和合计2的最终结果计时？我要求如此，因为我希望以单独的方式获取total1和total2的特定时间。

任务的目的

我完全意识到，长久占用内存是昂贵的，并且不是节省内存的最有效方法。此代码和问题的唯一目的是计时，而不是代码优化。

C代码

#include <stdio.h>
#include <time.h>

int main()
{

     long long total1 = 0, total2 = 0, i = 0;
     double simulation_time = 0;

     clock_t Start = clock();

     do
     {
          total1 += i + i; 
          total2 += i * i * i * i; 

          i++;

     } while (i < 1000000000);

     clock_t End = clock();

     printf("Total 1 = %u \n", total1);
     printf("Total 2 = %u \n", total2);

     simulation_time = (double)(End - Start) / CLOCKS_PER_SEC;
     printf("Runtime of Whole Simulation using clock_t: %f\n", simulation_time);


     return 0;
}

Answer 1

我不确定我是否理解您的问题，但是要将每个操作分别计时，您只需要制作两个单独的循环即可。

#include <stdio.h>
#include <time.h>

int main()
{
    long long total1 = 0, total2 = 0, i = 0, j = 1000000000;
    double simulation_time1, simulation_time2;
    clock_t Start, End;

    /* addition */
    Start = clock();
    do
    {
         total1 += i + i;
         i++;
    } while (i < j);
    End = clock();
    simulation_time1 = (double)(End - Start) / CLOCKS_PER_SEC;

    /* multiplication */
    Start = clock();
    do
    {
         total2 += i * i * i * i;
         i++;
    } while (i < j);
    End = clock();
    simulation_time2 = (double)(End - Start) / CLOCKS_PER_SEC;

    printf("Total 1 = %u \n", total1);
    printf("Total 2 = %u \n", total2);
    printf("Runtime of Whole Simulation: %f\n"
        "Runtime of Addition:         %f\n"
        "Runtime of Multiplication:   %f\n",
        simulation_time1 + simulation_time2,
        simulation_time1, simulation_time2);

    return 0;
}

Answer 2

您有两个希望分别计时的操作。第一个是i+i的累积，第二个是i*i*i*i的累积。

我将假设您正在-O2的x86-64上使用GCC。

如果我们注释掉total2，则生成的用于计算total1的程序集为：

  movabs rdx, 999999999000000000

聪明的编译器！它在编译时执行整个计算。因此所需的时间基本上为零。

如果我们改为注释total1，则用于计算total2的循环的程序集为：

.L2:
  mov rdx, rax
  imul rdx, rax       ; i squared
  add rax, 1
  imul rdx, rdx       ; i squared squared
  add rsi, rdx        ; accumulate
  cmp rax, 1000000000 ; loop condition
  jne .L2

我们可以参考Agner Fog的指令表：http://www.agner.org/optimize/instruction_tables.pdf

，而不是尝试对代码的每一行进行微基准测试。

假设您使用的是Intel Haswell，并手动进行一些端口分配，这些表将告诉我们：

.L2:                  ; ports  cycles  latency
  mov rdx, rax        ; p0     0.25    1
  imul rdx, rax       ; p1     1       3
  add rax, 1          ; p0     0.25    1
  imul rdx, rdx       ; p1     1       3
  add rsi, rdx        ; p0     0.25    1
  cmp rax, 1000000000 ; p5     0.25    1
  jne .L2             ; p6     1-2

其中一些指令可以重叠，因此每次迭代大约需要3-4个核心周期。在3-4 GHz处理器上，执行十亿次循环将花费大约1秒的时间。

在C语言循环中获取代码特定部分的时序

问题描述

问题

任务的目的

C代码

2 个答案: