Question

我正在尝试理解并使用现有的实用程序或可编程片段，这些片段允许根据内核空间中的功耗，CPU周期来测量CPU利用率/性能。

我有两个功能片段，可以完成同样的工作：

将ip地址转换为字符串。

char* inet_ntoa(struct in_addr in, char* buf, size_t* rlen)
{

        int i;
        char* bp;

        bp = buf;
        for (i = 0;i < 4; i++ ) {
                unsigned int o, n;
                o = ((unsigned char*)&in)[i];
                n = o;
                if ( n >= 200 ) {
                        *bp++ = '2';
                        n -= 200;
                }else if ( n >= 100 ) {
                        *bp++ = '1';
                        n -= 100;
                }
                if ( o >= 10 ) {
                        int i;
                        for ( i = 0; n >= 10; i++ ) {
                                n -= 10;
                        }
                        *bp++ = i + '0';
                }
                *bp++ = n + '0';
                *bp++ = '.';
        }
        *--bp = 0;
        if ( rlen ) {
                *rlen = bp - buf;
        }

        return buf;
}

和

char *inet_ntoa (struct in_addr in)
    {
      unsigned char *bytes = (unsigned char *) &in;
      __snprintf (buffer, sizeof (buffer), "%d.%d.%d.%d",
              bytes[0], bytes[1], bytes[2], bytes[3]);

      return buffer;
    }

后来的功能来自glibc。前一个是我自己的。

这两个函数将在内核空间中调用。如何衡量性能以进行比较。

我的机器是Ubuntu 14.04 x86 i686。 Linux内核3.13

我从源linux / tools安装了。

我的模块正在运行。如何连接perf以测量我的函数性能。

请建议。

Answer 1

英特尔工程师可能会对this paper感兴趣它解释了如何使用CPU计时器准确计时代码。

不要忘记时间广泛的输入。您可能还需要考虑稳健性的潜在差异（代码如何使用错误的输入行为）。

Answer 2

我最近一直在做同样的事情，我正在使用这些工具：

逆足 https://perf.wiki.kernel.org/index.php/Main_Page

ARM DS5 http://ds.arm.com/

PowerMonitor https://www.msoon.com/LabEquipment/PowerMonitor/

您可以获得ARM DS5的评估版并进行试用。它将明智地描述代码应用程序，您可以看到＆＃34; online＆＃34;数据

PowerMonitor已获得许可。

Perf是一个非常方便的工具，您可以分析不同的事件。应该为perf事件配置内核。

需要打开一些分析，可以在内核编译时启用。

有关Perf相关信息，您可以在此处找到指导： https://perf.wiki.kernel.org/index.php/Tutorial

Answer 3

你应该看看oprofile。示例输出（来自http://homepages.cwi.nl/~aeb/linux/profile.html）

# oprofpp -l -i /foo/vmlinux | tail
c012ca30 488      1.86174     kmem_cache_free
c010e280 496      1.89226     mask_and_ack_8259A
c010a61a 506      1.93041     restore_all
c0119220 603      2.30047     do_softirq
c0110b30 663      2.52938     delay_tsc
c012c7c0 703      2.68198     kmem_cache_alloc
c02146c0 786      2.99863     __copy_to_user_ll
c0169b70 809      3.08637     ext3_readdir
c01476f0 854      3.25805     link_path_walk
c016fcd0 1446     5.51656     ext3_find_entry

Answer 4

其他答案为精确测量提供了很好的建议。如果您正在查看粗略差异，则可能更容易使用timekeeping.h中的内容，例如do_gettimeofday：

uint64_t time_one_function(void (*func)(void))
{   
    const int NUM_ITERATIONS = 5000;
    struct timeval before, after;

    do_gettimeofday(&before);
    for (int i = 0; i < NUM_ITERATIONS; i++)
    {
        func();
    }
    do_gettimeofday(&after);

    // Time it took to do all iterations in microseconds
    uint64_t diff_microseconds = (after.tv_sec - before.tv_sec) * 1000000ULL + (after.tv_usec - before.tv_usec);

    // REturn roughly the time in nanoseconds for a single call
    return (diff_microseconds*1000) / NUM_ITERATIONS;
}

这将为单个函数提供一个粗略的纳秒时间，然后只需在两个函数上调用它。

如何检查内核

4 个答案: