Question

假设我们有一些封装在另一个结构中的结构化数据，以便形成循环链表。

typedef struct Data 
{
    int x;
    int y;
} Data;

typedef struct DataNode 
{
    struct DataNode *next;
    struct Data *data;
} DataNode;

假设循环链表是正确构造的并且*head指向列表的成员，在链中使用->运算符是否存在任何缺点（性能或其他方面），尤其是循环？

DataNode * findPrevMatching(int x, int y)
{
    // Chained arrow operators in a loop
    while (!(head->next->data->x == x && head->next->data->y == y))  
        head = head->next;

    return head;
}

如果我创建局部变量以便没有链式箭头会有什么不同吗？

DataNode * findPrevMatching(int x, int y)
{   
    DataNode *next = head->next;
    Data *data = next->data;

    while (!(data->x == x && data->y == y))
    {
        // Assign head->next to head
        head = head->next;

        // Assign each local variable, using the new head
        next = head->next;
        data = next->data;
    }

    return head;
}

Answer 1

在我的一条评论中，我注意到我没有进行任何测量，只有测量可能会显示任何有用的信息。我创建了一个标题和4个变体代码实现，如下所示：

node.h

begin

node1.c

typedef struct Data 
{
    int x;
    int y;
} Data;

typedef struct DataNode 
{
    struct DataNode *next;
    struct Data *data;
} DataNode;

extern DataNode *head;

extern DataNode *findPrevMatching(int x, int y);

node2.c

#include "node.h"

DataNode * findPrevMatching(int x, int y)
{
    // Chained arrow operators in a loop
    while (!(head->next->data->x == x && head->next->data->y == y))  
        head = head->next;

    return head;
}

node3.c

#include "node.h"


DataNode * findPrevMatching(int x, int y)
{   
    DataNode *next = head->next;
    Data *data = next->data;
    int thisX = data->x;
    int thisY = data->y;

    while (!(thisX == x && thisY == y))
    {
        // Assign head->next to head
        head = head->next;

        // Assign each local variable, using the new head
        next = head->next;
        data = next->data;
        thisX = data->x;
        thisY = data->y;
    }

    return head;
}

node4.c

#include "node.h"


DataNode * findPrevMatching(int x, int y)
{   
    DataNode *next = head->next;
    Data *data = next->data;

    while (!(data->x == x && data->y == y))
    {
        head = head->next;
        next = head->next;
        data = next->data;
    }

    return head;
}

不同优化制度下的代码大小

使用此命令行重复编译代码，但使用不同的值代替#include "node.h" DataNode * findPrevMatching(int x, int y) { DataNode *next = head->next; while (!(next->data->x == x && next->data->y == y)) { head = head->next; next = head->next; } return head; }：

-O0

Mac OS X 10.11.6上的GCC 6.2.0

大小是Mac OS X 10.11.6上带有GCC 6.2.0的gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node1.c gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node2.c gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node3.c gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node4.c命令报告的大小。

size

显然，没有优化（OFLAGS=-O0 __TEXT __DATA __OBJC others dec hex 176 0 0 1007 1183 49f node1.o 239 0 0 1226 1465 5b9 node2.o 208 0 0 1146 1354 54a node3.o 192 0 0 1061 1253 4e5 node4.o OFLAGS=-O1 __TEXT __DATA __OBJC others dec hex 95 0 0 872 967 3c7 node1.o 125 0 0 1335 1460 5b4 node2.o 118 0 0 1182 1300 514 node3.o 114 0 0 993 1107 453 node4.o OFLAGS=-O2 __TEXT __DATA __OBJC others dec hex 126 0 0 848 974 3ce node1.o 111 0 0 1410 1521 5f1 node2.o 121 0 0 1135 1256 4e8 node3.o 121 0 0 1005 1126 466 node4.o OFLAGS=-O3 __TEXT __DATA __OBJC others dec hex 126 0 0 848 974 3ce node1.o 111 0 0 1410 1521 5f1 node2.o 128 0 0 1111 1239 4d7 node3.o 126 0 0 937 1063 427 node4.o OFLAGS=-Os __TEXT __DATA __OBJC others dec hex 101 0 0 848 949 3b5 node1.o 135 0 0 1293 1428 594 node2.o 112 0 0 1133 1245 4dd node3.o 107 0 0 1003 1110 456 node4.o）最终会得到比任何优化更大的代码。在此代码中，最小对象大小来自-O0 node1.c优化时的代码。在-O1和-O2，-O3中的代码最小。对于node2.c和-Os，第一个代码是最小的。

来自XCode 8.0的

来自XCode 8.0版的-O1报告：

clang

报告的尺寸为：

Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin15.6.0

结论

绝对没有替代实验！您可以查看汇编程序并确定您认为最佳的代码。

Answer 2

如果您可以创建节点数组，请在数组中创建上一个和下一个指针索引，并从该池中分配所有节点，这可能在给定体系结构上具有性能优势。连续数组更有可能位于缓存中，您可以告诉操作系统何时使用，并且不会使用posix_madvise()或PrefetchVirtualMemory()等函数在该块中使用任何节点，您可能能够使用小于指针的索引并获得更小的节点，并且您的CPU可能支持间接寻址，这使得查找数组元素与查找指针一样高效。

纠正代码连接到指针连接指针的最糟糕的事情是一系列缓存未命中（或实际上是页面错误）。

要真正回答这个问题，你可能想要了解一下，找出程序在哪里花费所有时间，专注于那里，然后再次进行个人资料以找出你节省了多少时间。

Answer 3

使用多个字段解引用运算符（例如ptr->fld->otherfld->anotherfld）肯定是可能的，只要所有涉及的指针都有效（否则它是可怕的undefined behavior，并且您实际上很可能得到{{3}如果你很幸运，但发生了更多不好的事情segmentation fault，请参见could）。

性能方面，它可能有一些小问题。首先，编译器可能无法始终通过在寄存器中保留一些中间值来正确优化（例如ptr->fld->otherfield->anotherfield->bluefield = ptr->fld->otherfield->redfield + 2; ptr->fld->otherfield中的gcc -O2可能会保留在this中并被提取一次，但是编译器原则上不保证）。其次，您可能会遇到一些register问题和缓存未命中。阅读CPU cache的答案部分（这也是整体阅读的有用信息）。

在实践中，最近的编译器非常擅长http://norvig.com/21-days.html，如果你要求它们（例如，如果使用optimizing编译时使用gcc -O2 -S -fverbose-asm编译），那么你几乎不要＃ 39; t需要引入你的其他局部变量（除了可读性原因，这是引入它们的一个很好的理由）。另外，请详细了解GCC。

但是，不要先进行微优化并手动编写代码而不先进行基准测试。

如果您使用GCC，您可以使用->编译代码并查看生成的汇编程序代码。

使用链接在一起的多个箭头操作符（while (true);）有什么缺点吗？

实际上，如果你要求你的编译器进行优化，那就不多了（当然，假设所有涉及的指针都是有效的）。但是，源代码的可读性应始终是一个问题。

Answer 4

通常，我不会超过三层（两个 - ＆gt;运算符）。我同意这些链接在一起的太多使得它很难阅读。我也熟悉Ruby中的这个概念，其中太多链接在一起的方法可能是一个真正的麻烦。

我永远不会超越{em> {/ 1>} -> [并且我35年来一直在写C]。几乎总有办法避免多个级别。

在您的示例中，因为head是[AFAICT]全局的，如果head可以在持续时间内放入函数作用域变量（例如{fnc start local = head），则会更快当出现别名注意事项时（head = local），每次迭代时必须将head提取/存储到内存中，因为编译器不能忽略head可能的事实以某种方式更新它无法看到或预期（更多内容见下文）。

此外，使用更多拆分代码，在调试语句和assert检查中添加有条件编译更容易。并且，或许更重要的是，可以添加显示意图的逐个评论[正如您所做的那样]。＆＃34;等式越复杂，就越难做到这一点。

typedef struct Data {
    int x;
    int y;
} Data;

typedef struct DataNode {
    struct DataNode *next;
    struct Data *data;
} DataNode;

DataNode *head;

#ifdef DEBUG
#define dbgprt(_fmt...)     printf(_fmt)
#else
#define dbgprt(_fmt...)     /**/
#endif

DataNode *
findPrevMatching3(int x, int y)
{
    DataNode *hd = head;
    DataNode *next = hd->next;
    Data *data = next->data;
    int thisX;
    int thisY;

    while (1) {
        thisX = data->x;
        thisY = data->y;
        dbgprt("findPrevMatching3: thisX=%d thisY=%d\n",thisX,thisY);

        if (thisX != x)
            break;
        if (thisY != y)
            break;

        // Assign head->next to head
        hd = hd->next;

        // Assign each local variable, using the new head
        next = hd->next;
        data = next->data;
    }

    head = hd;

    return hd;
}

如果没有-DDEBUG，可能会认为上述效果较差[因为两个单独的 if/break序列]比原来的<：p>

while (!(thisX == x && thisY == y))

但是，再次，优化器将生成类似的代码（即7 insts / loop）它生成一个大小为0x33字节的函数[on x86 with -O2]。

这是一个稍微简化的版本。对我来说，这是最简单易懂的版本。它也是7 insts / loop，但代码大小减小到0x2B字节大小。所以，具有讽刺意味的是，它也是最紧凑，最快的版本。

typedef struct Data {
    int x;
    int y;
} Data;

typedef struct DataNode {
    struct DataNode *next;
    struct Data *data;
} DataNode;

DataNode *head;

#ifdef DEBUG
#define dbgprt(_fmt...)     printf(_fmt)
#else
#define dbgprt(_fmt...)     /**/
#endif

DataNode *
findPrevMatching3(int x, int y)
{
    DataNode *hd = head;
    DataNode *next;
    Data *data;
    int thisX;
    int thisY;

    while (1) {
        next = hd->next;
        data = next->data;

        thisX = data->x;
        thisY = data->y;
        dbgprt("findPrevMatching3: thisX=%d thisY=%d\n",thisX,thisY);

        if (thisX != x)
            break;
        if (thisY != y)
            break;

        // Assign head->next to head
        hd = hd->next;

        // Assign each local variable, using the new head
    }

    head = hd;

    return hd;
}

有关我的意思的更多信息＆＃34;别名注意事项＆＃34;，请参阅我的回答：Is accessing statically or dynamically allocated memory faster?

历史记录：在＆＃34;编程风格的元素＆＃34;，由Brian Kernigan [C的共同创造者]和PJ Plauger，他们说：＆＃34;使它正确在你加快速度之前＆＃34;

在这里，我们已经证明，当你做对了，你也可以加快速度。

使用链接在一起的多个箭头操作符（ - ＆gt;）有什么缺点吗？

4 个答案: