Question

在某个库中（FFTW：离散傅里叶变换计算），我遇到了一个头文件，其中包含以下注释和一些#defines。该评论谈到了一些编程技巧。但是我无法理解这个编程技巧到底是什么。有人可以解释一下吗？

/* hackery to prevent the compiler from ``optimizing'' induction
   variables in codelet loops.  The problem is that for each K and for
   each expression of the form P[I + STRIDE * K] in a loop, most
   compilers will try to lift an induction variable PK := &P[I + STRIDE * K].
   For large values of K this behavior overflows the
   register set, which is likely worse than doing the index computation
   in the first place.

   If we guess that there are more than
   ESTIMATED_AVAILABLE_INDEX_REGISTERS such pointers, we deliberately confuse
   the compiler by setting STRIDE ^= ZERO, where ZERO is a value guaranteed to
   be 0, but the compiler does not know this.

   16 registers ought to be enough for anybody, or so the amd64 and ARM ISA's
   seem to imply.
*/

#define ESTIMATED_AVAILABLE_INDEX_REGISTERS 16
#define MAKE_VOLATILE_STRIDE(nptr, x)                   \
     (nptr <= ESTIMATED_AVAILABLE_INDEX_REGISTERS ?     \
        0 :                                             \
      ((x) = (x) ^ X(an_INT_guaranteed_to_be_zero)))
#endif /* PRECOMPUTE_ARRAY_INDICES */

Answer 1

优化：每次循环中的迭代发生时，不是重新计算数组的索引，而是一些编译器预期下一个地址并将它们放在寄存器中，因为索引表达式是可预测的。

问题：某些索引表达式（如I + STRIDE * K）可能导致以这种方式使用大量寄存器，如果此数量超过寄存器总数，则某些寄存器值将被推送到堆栈内存，包括循环可能正在使用的其他变量。

技巧：为了强制编译器不使用此优化，使用外部整数。添加或者对这个零进行异或并将其存储在x中是一种无操作的“污染”步幅，从而导致索引表达式，使得优化分析无法预测。它不能再推断出这个变量的行为，即使我们知道它的行为非常像零。文件ifftw.h的相关摘录，从中派生出来：

extern const INT X(an_INT_guaranteed_to_be_zero);

#ifdef PRECOMPUTE_ARRAY_INDICES
...
#define MAKE_VOLATILE_STRIDE(nptr, x) (x) = (x) + X(an_INT_guaranteed_to_be_zero)

#else
...
#define ESTIMATED_AVAILABLE_INDEX_REGISTERS 16
#define MAKE_VOLATILE_STRIDE(nptr, x)                   \
     (nptr <= ESTIMATED_AVAILABLE_INDEX_REGISTERS ?     \
        0 :                                             \
      ((x) = (x) ^ X(an_INT_guaranteed_to_be_zero)))
#endif /* PRECOMPUTE_ARRAY_INDICES */

可以尝试完全避免此优化，或者在索引可以适合猜测可用寄存器数量的情况下允许此优化。它允许优化的方式是使用常数零。

一些词源：宏MAKE_VOLATILE_STRIDE从volatile关键字派生它的名字，表示一个值可能在不同的访问之间发生变化，即使它似乎没有被修改。此关键字可防止优化编译器优化后续读取或写入，从而错误地重用过时值或省略写入。（Wikipedia）

为什么volatile关键字而不是XOR'ing外部值是不够的，我不知道。

查询开源软件中使用的某种编程技巧

1 个答案: