我有两个类似的递归函数:
void dfs(const std::vector<std::vector<int64_t>>& tree, int64_t position) {
for (int64_t to : tree[position]) {
dfs(tree, to);
}
if (position % 2 == 1) {
for (int64_t to : tree[position]) {
}
} else {
for (int64_t to : tree[position]) {
}
}
}
没有循环范围的相同代码:
void dfs(const std::vector<std::vector<int64_t>>& tree, int64_t position) {
for (size_t i = 0; i < tree[position].size(); ++i) {
int64_t to = tree[position][i];
dfs(tree, to);
}
if (position % 2 == 1) {
for (size_t i = 0; i < tree[position].size(); ++i) {
int64_t to = tree[position][i];
}
} else {
for (size_t i = 0; i < tree[position].size(); ++i) {
int64_t to = tree[position][i];
}
}
}
完整代码here。
不幸的是第一个是segfaulting ,但仅限于带有clang的osx(3.5,3.8,3.9)。我使用gdb检查了堆栈帧大小:
Stack level 1, frame at 0x7fff5fbfebe0:
rip = 0x100000a8c in dfs(std::__1::vector<std::__1::vector<long long, std::__1::allocator<long long> >, std::__1::allocator<std::__1::vector<long long, std::__1::allocator<long long> > > > const&, long long); saved rip = 0x100000a8c
called by frame at 0x7fff5fbfef00, caller of frame at 0x7fff5fbfe8c0
Arglist at 0x7fff5fbfebd0, args:
Locals at 0x7fff5fbfebd0, Previous frame's sp is 0x7fff5fbfebe0
Saved registers:
rbp at 0x7fff5fbfebd0, rip at 0x7fff5fbfebd8
看起来第一个函数的dfs中的帧大小是:0x7fff5fbfef00 - 0x7fff5fbfebe0 = 0x320 = 800
为什么第一个代码使用接近1kb的堆栈帧?第二个代码仅使用:0x7fff5fbff240 - 0x7fff5fbff140 = 0x100 = 256
字节。
Stack level 1, frame at 0x7fff5fbff140:
rip = 0x100000d4d in dfs(std::__1::vector<std::__1::vector<long long, std::__1::allocator<long long> >, std::__1::allocator<std::__1::vector<long long, std::__1::allocator<long long> > > > const&, long long); saved rip = 0x100000d4d
called by frame at 0x7fff5fbff240, caller of frame at 0x7fff5fbff040
Arglist at 0x7fff5fbff130, args:
Locals at 0x7fff5fbff130, Previous frame's sp is 0x7fff5fbff140
Saved registers:
rbp at 0x7fff5fbff130, rip at 0x7fff5fbff138
PS:编译标志: clang ++ -O0 -std = c ++ 11 ./file.cpp
PSS:我试图尽可能地缩小代码。例如,如果我在其中一个分支中删除一个空的循环范围,那么一切都会恢复正常。PSSS:如果我用g ++ - 5编译,dfs中的帧大小只有0x50 = 128字节。同样在Linux上,一切看起来都不错。