在Windows / 64位/混合模式下快速捕获堆栈跟踪

时间:2015-12-28 22:18:43

标签: windows memory-leaks stack-trace mixed-mode

像你们大多数人一样,你们可能知道存在许多不同的机制来构建堆栈跟踪,从windows api开始,并继续进入神奇的汇编世界的深度 - 让我列举一些我已经研究过的链接。

总而言之,让我提一下,我想要有混合模式(托管和非托管)/ 64位+ AnyCPU应用程序的内存泄漏分析机制,并且从所有windows api的CaptureStackBackTrace最适合我的需求,但是因为我已分析 - 它不支持托管代码堆栈行走。 但是那个函数API最接近我的需要(因为它还计算回跟踪散列 - 特定调用堆栈的唯一标识符)。

我已经排除了查找内存泄漏的不同方法 - 我尝试过的大多数软件要么崩溃要么不能正常工作,要么产生不好的结果。

此外,我不想重新编译现有的软件并覆盖malloc / new其他机制 - 因为它的任务繁重(我们拥有大量代码库和大量的dll)。此外,我怀疑这不是我需要执行的一次性工作 - 问题以1-2周期回归,具体取决于编码的人和内容,因此我更倾向于在应用程序本身内置内存泄漏检测(内存) api挂钩)而不是一遍又一遍地解决这个问题。

http://www.codeproject.com/Articles/11132/Walking-the-callstack

使用StackWalk64 Windows API函数,但不能与托管代码一起使用。此外64位支持还不完全清楚 - 我已经看到64位问题的一些解决方法 - 我怀疑当在同一个线程内完成堆栈遍历时,此代码不能完全正常工作。

然后存在进程黑客: http://processhacker.sourceforge.net/

其中也使用StackWalk64,但扩展了它的回调函数(第7和第8个参数)以支持混合模式堆栈行走。 在使用7/8回调函数进行了大量复杂操作之后,我还成功地获得了支持混合模式支持的StackWalk64(将堆栈跟踪作为向量捕获 - 其中每个指针指向调用过去的汇编/ dll位置)。 但正如您可能猜到的那样 - StackWalk64的性能不足以满足我的需求 - 即使使用C#端的简单消息框,应用程序只需“挂起”一段时间,直到它正确启动。

我没有看到CaptureStackBackTrace函数调用的这么大的延迟,所以我认为StackWalk64的性能不足以满足我的需求。

还存在基于COM的堆栈跟踪确定方法 - 像这样: http://www.codeproject.com/Articles/371137/A-Mixed-Mode-Stackwalk-with-the-IDebugClient-Inter

http://blog.steveniemitz.com/building-a-mixed-mode-stack-walker-part-1/

但我害怕 - 它需要COM,并且线程需要初始化,并且由于内存api挂钩我不应该在任何线程中触及com状态,因为它可能导致更重的问题(例如不正确的公寓初始化,其他故障)

现在我已经达到了Windows API不足以满足我自己需求的程度,我需要手动遍历调用堆栈。 例如,可以找到这样的例子:

http://www.codeproject.com/Articles/11221/Easy-Detection-of-Memory-Leaks 请参阅函数FillStackInfo / 32位,不支持托管代码。

有一些关于反转堆栈跟踪的提及 - 例如在以下链接上:

  1. http://blog.airesoft.co.uk/2009/02/grabbing-kernel-thread-contexts-the-process-explorer-way/
  2. http://cbloomrants.blogspot.fi/2009/01/01-30-09-stack-tracing-on-windows.html
  3. http://www.gamedev.net/topic/364861-stack-dump-on-win32-how-to-get-api-addresses/
  4. http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
  5. 特别是1,3,4个链接提供了一些有趣的夜间阅读。 : - )

    但即便如此,它们也是相当有趣的机制,其中任何一个都没有完全可用的演示示例。

    我想其中一个例子是Wine的dbghelp实现(用于linux的Windows“模拟器”),它也显示了StackWalk64最终的工作原理,但我怀疑它与DWARF2文件格式的可执行文件有很大关系,因此它与当前的Windows PE可执行文件格式。

    有人可以指点我很好地实现堆栈行走,在64位架构上工作,支持混合模式(可以跟踪本机和托管内存分配),这完全是在寄存器/调用堆栈/代码分析中绑定的。 (1,3,4的组合实施)

    是否有人与Microsoft开发团队有任何良好的联系,他们可能会回答这个问题?

6 个答案:

答案 0 :(得分:2)

2015年9月1日 - 我找到了由进程黑客调用的原始函数,那个是

C:\的Windows \ Microsoft.NET \ Framework64 \ v4.0.30319 \ mscordacwks.dll OutOfProcessFunctionTableCallback

它的源代码 - 在这里: https://github.com/dotnet/coreclr/blob/master/src/debug/daccess/fntableaccess.cpp

从那里我拥有该源代码中的大部分更改的所有者 - Jan Kotas(jkotas@microsoft.com)并就此问题与他联系。

From: Jan Kotas <jkotas@microsoft.com>
To: Tarmo Pikaro <tapika@yahoo.com> 
Sent: Friday, January 8, 2016 3:27 PM
Subject: RE: Fast capture stack trace on windows 64 bit / mixed mode...

...

The mscordacwks.dll is called mscordaccore.dll in CoreCLR / github repro. The VS project 
files are auto-generated for it during the build 
(\coreclr\bin\obj\Windows_NT.x64.Debug\src\dlls\mscordac\mscordaccore.vcxproj).
You should be able to build and debug CoreCLR to understand how it works.
...

From: Jan Kotas <jkotas@microsoft.com>
To: Tarmo Pikaro <tapika@yahoo.com> 
Sent: Saturday, January 9, 2016 2:02 AM
Subject: RE: Fast capture stack trace on windows 64 bit / mixed mode...

> I've tried to replace 
> C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll dll loading 
> with C:\Prototyping\dotNet\coreclr-master\bin\obj\Windows_NT.x64.Debug\src\dlls\mscordac\Debug\mscordaccore.dll
> loading (just compiled), but if previously I could get mixed mode stack trace correctly:
> ...

mscordacwks.dll is tightly coupled with the runtime. You cannot mix and match them between runtimes.
What I meant is that you can use CoreCLR to understand how this works.

然后他推荐了这个适合我的解决方案:

int CaptureStackBackTrace3(int FramesToSkip, int nFrames, PVOID* BackTrace, PDWORD pBackTraceHash)
{
    CONTEXT ContextRecord;
    RtlCaptureContext(&ContextRecord);

    UINT iFrame;
    for (iFrame = 0; iFrame < nFrames; iFrame++)
    {
        DWORD64 ImageBase;
        PRUNTIME_FUNCTION pFunctionEntry = RtlLookupFunctionEntry(ContextRecord.Rip, &ImageBase, NULL);

        if (pFunctionEntry == NULL)
            break;

        PVOID HandlerData;
        DWORD64 EstablisherFrame;
        RtlVirtualUnwind(UNW_FLAG_NHANDLER,
            ImageBase,
            ContextRecord.Rip,
            pFunctionEntry,
            &ContextRecord,
            &HandlerData,
            &EstablisherFrame,
            NULL);

        BackTrace[iFrame] = (PVOID)ContextRecord.Rip;
    }

    return iFrame;
}

此代码snipet仍然缺少回溯哈希计算,但这可以在之后添加。

非常重要的是要注意,在调试此代码snipet时,您应该使用本机调试,而不是混合模式(默认情况下C#项目使用混合模式),因为它以某种方式干扰调试器中的堆栈跟踪。 (要弄清楚这种失真是如何以及为何发生的事情)

还有一个缺失的难题 - 如何使符号解析完全抵抗FreeLibrary / Jit代码处理,但这是我需要弄清楚的。

请注意,RtlVirtualUnwind很可能只适用于64位架构,而不是手臂或32位。

另一个有趣的事情是存在函数RtlCaptureStackBackTrace 它以某种方式类似于windows api函数CaptureStackBackTrace - 但它们有所不同 - 至少通过命名。此外,如果您检查RtlCaptureStackBackTrace - 它最终调用RtlVirtualUnwind - 您可以从Windows Research内核源代码检查它

RtlCaptureStackBackTrace
>
RtlWalkFrameChain
>
RtlpWalkFrameChain
>
RtlVirtualUnwind

但我测试的RtlCaptureStackBackTrace无法正常工作。 与上面的函数RtlVirtualUnwind不同。

这是一种神奇的魔力。 : - )

我将继续使用第2阶段问题调查问卷 - 在此处:

Resolve managed and native stack trace - which API to use?

答案 1 :(得分:1)

x64堆栈行走很复杂,因为你已经发现了。一个简单的替代方法是简单地不这样做,而是将硬件留给OS ETW stackwalker。这是有效的,它比你得到的快得多。

您可以通过发布自己的ETW事件来利用它。在此之前,您需要为您的事件提供程序启动ETW会话,并为您的提供程序启用堆栈遍历。在Windows 7上有一个问题,它不起作用,除非托管堆栈帧都是NGenned因为x64 ETW Stackwalker将停止,如果他发现一个堆栈帧不在任何加载模块中,对于JITed代码是真的。

从Windows 8开始,ETW Stackwalker将始终遍历堆栈的第一个MB堆栈,以解决JIT问题。如果启用ETW跟踪,JIT编译器会为生成的代码发出Unwind Infos,并通过RtlAddGrowableFunctionTable注册它,这样就可以在内核中快速从堆栈中走出堆栈。当出于兼容性原因未启用ETW跟踪时,情况会有所不同。

如果您正在使用malloc / free new / delete内存泄漏,您还可以使用自Windows 7以来已存在的堆分配跟踪的OS bultin功能。请参阅 xperf -help start 和{{3有关堆分配跟踪的更多信息。 您可以为已经运行的进程启用它而不会出现任何问题。缺点是,对于任何现实世界的应用程序,生成的数据都是巨大的。但是,如果您只是在大分配之后,那么它可以帮助仅跟踪VirtualAlloc调用,这些调用也可以以最小的开销启用。

自.NET 4.5以来的托管代码也有自己的ETW分配跟踪提供程序,即使在x64 Windows 7上也可以完全堆栈,因为它自己完成了一个完整的托管堆栈。更多信息可以在CoreClr Sources中找到: https://randomascii.wordpress.com/2015/04/27/etw-heap-tracingevery-allocation-recorded/中的ETW :: SamplingLog :: SendStackTrace 了解更多细节。

这只是一个粗略的概述。要真正获得所有必要的细节,我会担心整本书。我每天都在学习新事物。

这是一个 heapalloc.cmd 脚本,可用于跟踪堆分配。默认情况下,如果您的泄漏在更长的时间内累积记录所有分配堆栈而不会在运行时压缩它们,则它会记录到500MB环形缓冲区中,这将无法与WPA一起使用。但是你可以发布一个巨大的ETL文件并为它编写自己的查看器。

@echo off 
setlocal enabledelayedexpansion
REM consider using a different drive for ETL output to prevent slowing down 
REM your application and to prevent lost buffers
set OUTDIR=C:\TEMP
set OUTFILENAME=HeapTracing.etl
REM Final output file
set OUTFILE=!OUTDIR!\!OUTFILENAME!
set CLRUNDOWNFILE=!OUTDIR!\clr_HeapDCend.etl
set KERNELFILE=!OUTDIR!\kernel.etl
set CLRSESSIONFILE=!OUTDIR!\clrHeapSession.etl
set HEAPUSERFILE=!OUTDIR!\HeapUserSession.etl
REM Default is allocation and realloc to track memory leaks
REM HeapFree is the other option to track double free calls
set HEAPTRACINGFLAGS=HeapAlloc+HeapRealloc 

if "%3" NEQ "" (
echo Overriding Heap Tracing Flags with: %3
set HEAPTRACINGFLAGS=%3
)


if "%1" EQU "-start" ( 
    call :StartTracing -PidNewProcess %2
    goto :Exit 
) 

if "%1" EQU "-attachPid" ( 
    call :StartTracing -Pids %2
    goto :Exit 
) 

if "%1" EQU "-startNext" (
    reg add "HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\%~nx2" /v TracingFlags /t REG_DWORD /d 1 /f
    if not %errorlevel% == 0 goto failure
    call :StartTracing -Pids 0
    goto :Exit
)

if "%1" EQU "-stop" ( 
    set XPERF_CreateNGenPdbs=1
    xperf -start ClrRundownSession -on e13c0d23-ccbc-4e12-931b-d9cc2eee27e4:0x118:5+a669021c-c450-4609-a035-5af59af4df18:0x118:5 -f "!CLRUNDOWNFILE!" -buffersize 256 -minbuffers 256 -maxbuffers 512 
    call :WaitUntilRundownCompleted "!CLRUNDOWNFILE!"
    xperf -stop -stop ClrSession ClrRundownSession HeapSession | findstr /V identifiable 2> NUL

    echo Merging profiles
    REM Reset symbol path to create the pdbs files in the output directory with in the directory with the same name like our etl file
    set TMPSYMBOLPATH=!_NT_SYMBOL_PATH!
    REM Each tool is using a different pdb cache folder. If you are using them side by side 
    REM you have to wait a long time to refresh the pdb cache. To spare the waiting time we use 
    REM the pdb cache folder from WPR

    mkdir C:\ProgramData\WindowsPerformanceRecorder\NGenPdbs_Cache 2> NUL
    set _NT_SYMBOL_PATH=srv*C:\ProgramData\WindowsPerformanceRecorder\NGenPdbs_Cache 
    mklink /D "!OUTFILE!.NGENPDB" C:\ProgramData\WindowsPerformanceRecorder\NGenPdbs_Cache  2> NUL

    echo Managed PDBs are stored at: !OUTFILE!.NGENPDB. If you want to transfer the etl do not forget to copy this directory with the pdbs as well. 
    echo Merging ETL files and generating native pdbs

    xperf -merge  "!KERNELFILE!" "!CLRSESSIONFILE!" "!CLRUNDOWNFILE!" "!HEAPUSERFILE!" "!OUTFILE!"
    set _NT_SYMBOL_PATH=!TMPSYMBOLPATH!
    echo !OUTFILE! was created

    if "%2" NEQ "" reg delete "HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\%~nx2" /v TracingFlags /f 2> NUL
    goto :Exit 
) 

goto Usage:

:StartTracing
xperf -start ClrSession -on  Microsoft-Windows-DotNETRuntime:5 -f "!CLRSESSIONFILE!" -buffersize 128 -minbuffers 256 -maxbuffers 512 
xperf -on PROC_THREAD+LOADER+latency+virt_alloc -stackwalk VirtualAlloc  -f "%KERNELFILE%"
xperf -start HeapSession -heap %1 %2 -BufferSize 1024 -MinBuffers 128 -MaxBuffers 1024 -stackwalk %HEAPTRACINGFLAGS% -f "!HEAPUSERFILE!" -FileMode Circular -MaxFile 1024
exit /B

REM Wait until writing to ETL file has stopped by checking its file size
:WaitUntilRundownCompleted
:StillWriting
    for %%F in (%1) do set "size=%%~zF"
    timeout /T 1  > nul
    for %%F in (%1) do set "size2=%%~zF"
    if "!size!" EQU "" goto :EndWriting
    if "!size!" NEQ "!size2!" goto StillWriting
:EndWriting
timeout /T 1  > nul
exit /B


:Usage
    echo Usage: 
    echo HeapAlloc.cmd -start [executable] or -stop
    echo               -start [executable] Start a trace session 
    echo               -startNext [executable] Start heap tracing for all subsequent calls to executable.
    echo               -attachPid ddd Start a trace session for specified process
    echo               -stop  [executable] Stop a trace session 
    echo Examples
    echo     HeapAlloc.cmd -startNext devenv.exe
    echo     HeapAlloc.cmd -stop      devenv.exe
    echo To attach to a running process
    echo     HeapAlloc.cmd -attachPid dddd
    echo     HeapAlloc.cmd -stop 
    echo You must call -stop for your executable if you have used -start or startNext because heap allocation tracing will enabled until you stop it!
goto :Exit 

:failure
    echo Error occured
goto :Exit

:Exit

答案 2 :(得分:1)

25.1.2016写作单独的问题,作为补充信息。

对于堆栈唯一ID,CaptureStackBackTrace使用所有指令指针的简单求和 - 想法来自:&#34; Windows_Research_Kernel(sources)\ WRK-v1.2 \ base \ ntos \ rtl \ amd64 \ stkwalk.c&#34; :

    size_t hashValue = 0;

    for (int i = 0; i < nFrames; i++)
        hashValue += PtrToUlong(BackTrace[i]);

    *pBackTraceHash = (DWORD)hashValue;

我不确定上次转换 - 有些人将最后一个参数指定为DWORD,有些指定为ulong64,但它不相关。这种计算的主要问题是它不够独特。对于递归函数调用的情况 - 如果您有调用顺序:

func1
func2
func3

堆栈追踪:

func1
func3
func2

会完全相同。

我调试的内容 - 对于内存泄漏检测我得到62876次错误命中 - 唯一堆栈ID计算不够可靠。

我将公式改为:

static DWORD crc32_tab[] =
{
    0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f,
    0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
    0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2,
    0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
    0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9,
    0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
    0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c,
    0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
    0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423,
    0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
    0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106,
    0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
    0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d,
    0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
    0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950,
    0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
    0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7,
    0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
    0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa,
    0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
    0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81,
    0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
    0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84,
    0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
    0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb,
    0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
    0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e,
    0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
    0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55,
    0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
    0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28,
    0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
    0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f,
    0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
    0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242,
    0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
    0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69,
    0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
    0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc,
    0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
    0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693,
    0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
    0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d
};

if (pBackTraceHash)
{

    size_t hashValue = 0;
    for( int idxFrame = 0; idxFrame < (int)iFrame; idxFrame++ )
    {
        unsigned char* p = (unsigned char*)&BackTrace[idxFrame];
        for( int i = 0; i < sizeof(void*); i++ )
            hashValue = crc32_tab[ ((hashValue ^ *p++) & 0xFF) ] ^ (hashValue >> 8);
    }
    *pBackTraceHash = (DWORD)hashValue;
}

此算法不会给出错误命中,但会稍微降低执行速度。

内存泄漏统计数据也有所不同: 不可靠的算法:泄漏内存总量:48&#39; 874&#39; 764/371个分配池 基于Crc32的算法:泄漏内存总量:48&#39; 874&#7; 764/614个分配池

就像你看到的那样 - 统计结合了(池)类似的调用堆栈 - 更少的碎片,但原始的调用堆栈丢失了。 (统计数据不正确)

可能有人可以给我一些更快的算法吗?

答案 3 :(得分:0)

请注意自己:

显然CaptureStackBackTrace可能直接或间接调用RtlCaptureStackBackTrace,该函数的源代码显然是当前的开源代码 - 可以使用&#34; windows research kernel&#34;进行搜索。

我通过收获意外发现的代码 https://github.com/dotnet/coreclr/blob/master/src/unwinder/amd64/unwinder_amd64.cpp

在代码中引用了从windows内核借来的内容:

以下所有内容均来自Windows的minkernel \ ntos \ rtl \ amd64 \ exdsptch.c文件

通过谷歌搜索更多,我找到了Windows内核本身。

可能我可以升级该功能以支持托管堆栈(使用来自进程黑客的信息)。

[4.1.2015]通过深入分析,看起来主要的性能瓶颈并非如此 CaptureStackBackTrace本身 - 因为它是简单的迭代,结构查找,但是托管模式堆栈遍历,我调用C:\ Windows \ Microsoft.NET \ Framework64 \ v4.0.30319 \ mscordacwks.dll / OutOfProcessFunctionTableCallback - 你可以找到它&# 39;源代码在.net发行版中,显然它的分配内存用于分析JIT编译结构。但问题是JIT编译可以随时改变,并且只有可靠的堆栈跟踪的方法是反复重新查询相同的信息,这可能导致内存分配的开销。我想代码需要改变,以便mscordacwks类似的代码不会自己分配内存,而是使用运行时结构来确定调用堆栈和函数表/函数条目。

P.S。如果你拒绝这个答案,我想知道原因,替代方案是什么。如果你自己尝试过替代方案,那就更好了。

答案 4 :(得分:0)

顺便说一下 - 如果有人错过了适用于Windows的StackWalk的原始实现,它就位于此处:

https://github.com/dotnet/coreclr/blob/master/src/utilcode/stacktrace.cpp

答案 5 :(得分:0)

27.1.2016并且可能没有直接问题 - 是32位调用堆栈确定。 我已经问过要使用哪个API - 至少CaptureStackBackTrace会产生不完整的遍历(只有本机代码),而且32位窗口也不存在RtlVirtualUnwind api函数。

From: Noah Falk <noahfalk@microsoft.com>
To: Tarmo Pikaro <tapika@yahoo.com>; Mike McLaughlin <mikem@microsoft.com> 
Cc: Jan Kotas <jkotas@microsoft.com>
Sent: Tuesday, January 26, 2016 1:34 AM
Subject: RE: Resolving managed call stack from void*

Hi Tarmo, hope the exploration of stackwalking has been interesting. 
If I followed you correctly you’ve been successful on x64 but hoping you can extend your technique to 32 bit. 
Indeed the RtlCaptureVirtualUnwind techniques don’t work here, and the fundamental reason behind it is that 
while x64 defines a specific calling convention that all code on Windows is forced to use, x86 does not. 
This means that there is no algorithm the OS could implement which guarantees correct unwinding when PDBs are 
unavailable. However you do have some options:

1)      You can use simple heuristics that work for certain kinds of code. 
Unoptimzed code on x86 often uses EBP chaining, in which ESP in the current frame points to EBP, and EBP points 
to the parent frame’s EBP, and so on down the stack. The return address is stored on the stack adjacent to EBP. 
As I recall all jitted code produced by recent versions of .Net follows these conventions, including optimized 
jitted code. However when a compiler performs inlining these conventions will be unable to detect it, and optimized 
code that does not follow this convention could easily cause the stack to become unwalkable.

2)      If you are willing to load PDBs you can use the DIA APIs to walk the stack: 
https://msdn.microsoft.com/en-us/library/dt06fh94.aspx. The PDB contains additional data about optimized code 
which allows frames that do not follow the EBP chaining convention to be correctly unwound. 
This is the stack walk API that Visual Studio is using when it debugs 32 bit native code on Windows.

3)      The ICorDebug APIs (https://msdn.microsoft.com/en-us/library/dd646502(v=vs.110).aspx) are a set of 
APIs that are designed to support managed code debuggers. Starting in .Net 4.0 the ICorDebug API supports 
dump debugging, however the API is designed in such a way that you don’t have to serialize a dump file. 
This is likely to be more complicated than you would want, but its supported to the use the Windows process 
snapshot APIs to take a snapshot of the memory space and then direct the ICorDebug API to read from this 
snapshot as if it was a dump. One advantage of the ICorDebug API is that not only will it give you managed 
stack frames, it also allows exporing all the other kinds of data debuggers would expose such as parameters, 
local values, fields of objects, types of the values, etc.

The MDbg tool (https://www.microsoft.com/en-us/download/details.aspx?id=2282) is a complete sample debugger 
with source included. It supports dump debugging and displaying callstacks, though it won’t have any specific 
example about using the process snapshot APIs in place of using a dump. The main change would be replacing 
the implementation of ICorDebugDataTarget. MDbg has an implementation that reads from a dump file and you 
would need to create a new implementation that reads from a process snapshot using the windows APIs 
(https://msdn.microsoft.com/en-us/library/dn457825(v=vs.85).aspx). I’ve never written the code myself and 
I’ve heard from other tool authors that they found using the windows snapshot APIs more difficult than expected,
 but eventually they were successful.

我有点受到方法1的启发,因为已经在另一个项目中看到了类似的方法,所以我编写了自己的32位堆栈遍历实现:

int CaptureStackBackTracePro( int FramesToSkip, int nFrames, PVOID* BackTrace, PDWORD pBackTraceHash )
{
    //
    //  This approach was taken from StackInfoManager.cpp / FillStackInfo
    //  http://www.codeproject.com/Articles/11221/Easy-Detection-of-Memory-Leaks
    //  - slightly simplified the function itself.
    //
    int regEBP;
    __asm mov regEBP, ebp;

    long *pFrame = (long*) regEBP;              // pointer to current function frame
    void* pNextInstruction;
    int iFrame = 0;

    //
    // Using __try/_catch is faster than using ReadProcessMemory or VirtualProtect.
    // We return whatever frames we have collected so far after exception was encountered.
    //
    __try {
        for( ; iFrame < nFrames; iFrame++ )
        {
            pNextInstruction = (void*)(*(pFrame + 1));

            if( !pNextInstruction )     // Last frame
                break;

            BackTrace[iFrame] = pNextInstruction;
            pFrame = (long*)(*pFrame);
        }
    }
    __except(EXCEPTION_EXECUTE_HANDLER) 
    {
    }

    // pBackTraceHash fillout is missing, see in another answer code snipet.

    return iFrame;

} //CaptureStackBackTracePro

简短测试表明此功能能够捕获本机和托管堆栈帧。

优化的代码我想要进行更深入的分析。最好省略优化或仅优化代码的相关部分 - 以获得更好的诊断效果?!