Question

我有一个带有多个工作线程的应用程序，每个核心一个。在现代的8核机器上，我有8个这样的线程。我的应用程序加载了很多插件，这些插件也有自己的工作线程。因为应用程序使用大块内存（照片，例如200 MB），我有内存碎片问题。问题是每个线程都分配了{$ MAXSTACKSIZE ...}的地址空间。它不使用物理内存，而是使用地址空间。我将MAXSTACKSIZE从1MB降低到128KB，似乎可行，但如果我接近极限，我现在不行。有没有可能测量真正使用多少堆栈？

Answer 1

使用它来计算为当前线程堆栈提交的内存量：

function CommittedStackSize: Cardinal;
asm
  mov eax,[fs:$4] // base of the stack, from the Thread Environment Block (TEB)
  mov edx,[fs:$8] // address of lowest committed stack page
                  // this gets lower as you use more stack
  sub eax,edx
end;

我没有的另一个想法。

Answer 2

为了完整起见，我在opc0de's answer中添加CommittedStackSize函数的一个版本，用于确定将用于x86 32的已用堆栈数量 - 和64位版本的Windows（opc0de的功能仅适用于Win32）。

opc0de的函数从Window的Thread Information Block (TIB)查询堆栈基址的地址和最低的已提交堆栈库。 x86和x64有两个不同之处：

Win32上的FS段注册表指向TIB，但Win64上的GS指向TIB（请参阅here）
结构中项目的绝对偏移量不同（主要是因为某些项目是指针，分别是Win32 / 64上的4个字节和8个字节）

另请注意，BASM代码存在细微差别，因为在x64上，需要abs才能使汇编程序使用与段寄存器的绝对偏移量。

因此，适用于Win32和Win64版本的版本如下所示：

{$IFDEF MSWINDOWS}
function CommittedStackSize: NativeUInt;
//NB: Win32 uses FS, Win64 uses GS as base for Thread Information Block.
asm
 {$IFDEF WIN32}
  mov eax, [fs:04h] // TIB: base of the stack
  mov edx, [fs:08h] // TIB: lowest committed stack page
  sub eax, edx      // compute difference in EAX (=Result)
 {$ENDIF}
 {$IFDEF WIN64}
  mov rax, abs [gs:08h] // TIB: base of the stack
  mov rdx, abs [gs:10h] // TIB: lowest committed stack page
  sub rax, rdx          // compute difference in RAX (=Result)
 {$ENDIF}
{$ENDIF}
end;

Answer 3

我记得在几年前我用FillChar给所有可用的堆栈空间带零，并从最后开始计算deinit上的连续零。如果您通过其探测运行的步伐发送您的应用程序，这就产生了良好的“高水位线”。

当我回到非移动状态时，我会挖出代码。

更新：好的原则在这个（古代）代码中得到了证明：

{***********************************************************
  StackUse - A unit to report stack usage information

  by Richard S. Sadowsky
  version 1.0 7/18/88
  released to the public domain

  Inspired by a idea by Kim Kokkonen.

  This unit, when used in a Turbo Pascal 4.0 program, will
  automatically report information about stack usage.  This is very
  useful during program development.  The following information is
  reported about the stack:

  total stack space
  Unused stack space
  Stack spaced used by your program

  The unit's initialization code handles three things, it figures out
  the total stack space, it initializes the unused stack space to a
  known value, and it sets up an ExitProc to automatically report the
  stack usage at termination.  The total stack space is calculated by
  adding 4 to the current stack pointer on entry into the unit.  This
  works because on entry into a unit the only thing on the stack is the
  2 word (4 bytes) far return value.  This is obviously version and
  compiler specific.

  The ExitProc StackReport handles the math of calculating the used and
  unused amount of stack space, and displays this information.  Note
  that the original ExitProc (Sav_ExitProc) is restored immediately on
  entry to StackReport.  This is a good idea in ExitProc in case a
  runtime (or I/O) error occurs in your ExitProc!

  I hope you find this unit as useful as I have!

************************************************************)

{$R-,S-} { we don't need no stinkin range or stack checking! }
unit StackUse;

interface

var
  Sav_ExitProc     : Pointer; { to save the previous ExitProc }
  StartSPtr        : Word;    { holds the total stack size    }

implementation

{$F+} { this is an ExitProc so it must be compiled as far }
procedure StackReport;

{ This procedure may take a second or two to execute, especially }
{ if you have a large stack. The time is spent examining the     }
{ stack looking for our init value ($AA). }

var
  I                : Word;

begin
  ExitProc := Sav_ExitProc; { restore original exitProc first }

  I := 0;
  { step through stack from bottom looking for $AA, stop when found }
  while I < SPtr do
    if Mem[SSeg:I] <> $AA then begin
      { found $AA so report the stack usage info }
      WriteLn('total stack space : ',StartSPtr);
      WriteLn('unused stack space: ', I);
      WriteLn('stack space used  : ',StartSPtr - I);
      I := SPtr; { end the loop }
    end
    else
      inc(I); { look in next byte }
end;
{$F-}


begin
  StartSPtr := SPtr + 4; { on entry into a unit, only the FAR return }
                         { address has been pushed on the stack.     }
                         { therefore adding 4 to SP gives us the     }
                         { total stack size. }
  FillChar(Mem[SSeg:0], SPtr - 20, $AA); { init the stack   }
  Sav_ExitProc := ExitProc;              { save exitproc    }
  ExitProc     := @StackReport;          { set our exitproc }
end.

（来自http://webtweakers.com/swag/MEMORY/0018.PAS.html）

我依旧记得当时曾与Kim Kokkonen合作过，我认为原始代码来自他。

这种方法的好处是在程序运行期间没有性能损失并且没有分析操作。只有在关闭时，循环直到更改值的代码才会占用CPU周期。（我们稍后在组装中编写了一个。）

Answer 4

即使所有8个线程都接近使用1MB的堆栈，这只是8MB的虚拟内存。 IIRC，线程的默认初始堆栈大小为64K，除非达到进程线程堆栈限制，否则会因页面错误而增加，此时我假设您的进程将以“堆栈溢出”消息停止：（（

我担心减少进程堆栈限制$ MAXSTACKSIZE不会减轻你的碎片/分页问题，如果有的话。你需要更多的RAM，以便你的超级照片应用程序的常驻页面集更大＆amp;所以捶打减少了。

在您的流程中，平均而言，总共有多少个线程？任务经理可以证明这一点。

RGDS，马丁

Answer 5

虽然我确信您可以减少应用中的线程堆栈大小，但我认为它不会解决问题的根本原因。您现在正在使用8核计算机，但在16核或32核等上会发生什么。

使用32位Delphi，您的最大地址空间为4GB，因此这在某种程度上限制了您。您可能需要为部分或全部线程使用较小的堆栈，但在足够大的机器上仍然会遇到问题。

如果您帮助您的应用扩展到更大的机器，您可能需要采取以下一个或其他步骤：

避免创建比核心更多的线程。使用可用于插件的线程池体系结构。如果没有.net环境的优势来实现这一目标，那么最好对Windows线程池API进行编码。也就是说，必须有一个好的Delphi包装器。
处理内存分配模式。如果你的线程在200MB的区域内分配连续的块，那么这将对你的分配器造成过度的压力。我发现通常最好在较小的固定大小的块中分配如此大量的内存。这种方法可以解决您遇到的碎片问题。

Answer 6

减少$ MAXSTACKSIZE将无效，因为Windows始终将线程堆栈与1Mb（？）对齐。

防止碎片的一种（可能的？）方法是在创建线程之前保留（不是alloc！）虚拟内存（使用VirtualAlloc）。并在线程运行后释放它。这样Windows就无法为线程使用保留空间，因此您将拥有一些连续的内存。

或者您可以为大型照片制作自己的内存管理器：预留大量虚拟内存并手动从此池中分配内存。（您需要自己维护已使用和已用内存的列表）。

至少，这是一个理论，不知道它是否真的有用......

什么是安全的最大堆栈大小或如何衡量堆栈的使用？

6 个答案: