Question

我正在开发一个复制字符串的程序。我检查了性能与glibc进行比较。我使用以下命令下载了glibc的源代码：

apt-get source glibc

我将与以下代码进行比较。

/glibc-2.19/string/strcpy.c
#include<string>并使用strcpy()

它必须是类似的表现，我预计...... 然而，结果，表现完全不同。

我为gcc尝试了某种类型的优化选项，例如O1 O2 O3，但效果相似。

是否有某种魔法可以获得更快的速度？我希望知道原因。

这是代码

// test for performance.

/******************************************************************************/

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <stddef.h>


/******************************************************************************/
char *
strcpy_glibc (dest, src)
     char *dest;
     const char *src;
{
  char c;
  char *s = (char *) src;
  const ptrdiff_t off = dest - s - 1;

  do
    {
      c = *s++;
      s[off] = c;
    }
  while (c != '\0');

  return dest;
}

/******************************************************************************/
void test(int iLoop, int iLen,
    char *szFuncName, char*(*func)(char *s1, const char *s2))
{
    time_t          tm1, tm2;
    int             i;
    char   s1[512];
    char   s2[512];

    // initialize the test string.
    for(i = 0; i < iLen; i++) {
        s1[i] = '@';
    }
    s1[iLen] = '\0';

    /**************************************************************************/
    printf("test(): %s() started, iLoop = %d, iLen = %d.\n",
        szFuncName, iLoop, iLen);

    tm1 = time(NULL);

    for(i = 0; i < iLoop; i++) {
        func(s2, s1);
        func(s1, s2);
        func(s2, s1);
        func(s1, s2);
        func(s2, s1);

        func(s1, s2);
        func(s2, s1);
        func(s1, s2);
        func(s2, s1);
        func(s1, s2);
    }

    tm2 = time(NULL);

    printf("test(): %s() terminated in %d [sec].\n", szFuncName, (int)(tm2 - tm1));
    printf("test(): %s() answer s1[0] = %c.\n", szFuncName, s1[0]);
}

/******************************************************************************/
int main(int argc, char *argv[])
{
    printf("main(): Started.\n");

    test(100000000, 511, "strcpy_glibc", strcpy_glibc);
    test(100000000, 511, "strcpy", strcpy);
    test(100000000, 511, "strcpy_glibc", strcpy_glibc);
    test(100000000, 511, "strcpy", strcpy);

    printf("main(): Terminated.\n");
    return 0;
}

/******************************************************************************/
/* EOF */

结果就在这里......

************************$ ./strcpy_test_3
main(): Started.
test(): strcpy_glibc() started, iLoop = 100000000, iLen = 511.
test(): strcpy_glibc() terminated in 238 [sec].
test(): strcpy_glibc() answer s1[0] = @.
test(): strcpy() started, iLoop = 100000000, iLen = 511.
test(): strcpy() terminated in 56 [sec].
test(): strcpy() answer s1[0] = @.
test(): strcpy_glibc() started, iLoop = 100000000, iLen = 511.
test(): strcpy_glibc() terminated in 238 [sec].
test(): strcpy_glibc() answer s1[0] = @.
test(): strcpy() started, iLoop = 100000000, iLen = 511.
test(): strcpy() terminated in 55 [sec].
test(): strcpy() answer s1[0] = @.
main(): Terminated.
************************$

嗯，这意味着strcpy()比strcpy_glibc()快4倍，但代码相同。

我很困惑......

Answer 1

你不能直接将libc代码复制到应用程序中并希望获得更好的性能，因为libc和OS有很多特定的代码和内部知识，所以它预期会有性能差异。

试试这个：

static __inline__ __attribute__((always_inline))
char * strcpy_glibc(char * __restrict to, const char * __restrict from)
{
    char *save = to;

    for (; (*to = *from); ++from, ++to);
    return(save);
}

代替函数指针尝试在应用程序中inline function，如果不是频繁调用。肯定会获得更好的性能，但是这段代码不能处理极端情况和检查。

glibc和相同代码之间的性能不同

1 个答案: