Question

我一直在寻找使用JIT的方法内联并登陆this article by Scott Hanselman。我进一步使用了他的代码，似乎虽然当代码在Release模式下运行时只有几个调用堆栈，但实际上似乎运行好像这些额外的帧仍然存在于已编译的代码中（即使它们确实存在）不报告如此）。

首先，如果您想要跳转并运行它，我已将代码放在此处： https://github.com/Mike-EEE/StackOverflow.Performance

我在.NET 4.7.1，.NET Core 2.0，甚至是最近宣布的新.NET Core 2.1 Preview上尝试过这个。所有都具有相同的结果。

我所做的是创建一个发出消息的简单命令，然后创建一个后续的多重修饰命令，该命令多次包装这个简单的命令。在已发布的代码中，此装饰完成10次，从而产生具有10个级别的嵌套命令（如果计算origin simple命令，则为11）。

测试中使用的这两个命令都使用空委托来发出消息，因为在性能测试期间使用import android.graphics.Canvas; import android.graphics.Path; import android.graphics.drawable.Drawable; import com.github.mikephil.charting.animation.ChartAnimator; import com.github.mikephil.charting.data.Entry; import com.github.mikephil.charting.interfaces.dataprovider.LineDataProvider; import com.github.mikephil.charting.interfaces.datasets.ILineDataSet; import com.github.mikephil.charting.renderer.LineChartRenderer; import com.github.mikephil.charting.utils.Transformer; import com.github.mikephil.charting.utils.ViewPortHandler; import java.util.List; public class MyLineLegendRenderer extends LineChartRenderer { MyLineLegendRenderer(LineDataProvider chart, ChartAnimator animator, ViewPortHandler viewPortHandler) { super(chart, animator, viewPortHandler); } // This method is same as its parent implementation. (Required so our version of generateFilledPath() is called.) @Override protected void drawLinearFill(Canvas c, ILineDataSet dataSet, Transformer trans, XBounds bounds) { final Path filled = mGenerateFilledPathBuffer; final int startingIndex = bounds.min; final int endingIndex = bounds.range + bounds.min; final int indexInterval = 128; int currentStartIndex; int currentEndIndex; int iterations = 0; // Doing this iteratively in order to avoid OutOfMemory errors that can happen on large bounds sets. do { currentStartIndex = startingIndex + (iterations * indexInterval); currentEndIndex = currentStartIndex + indexInterval; currentEndIndex = currentEndIndex > endingIndex ? endingIndex : currentEndIndex; if (currentStartIndex <= currentEndIndex) { generateFilledPath(dataSet, currentStartIndex, currentEndIndex, filled); trans.pathValueToPixel(filled); final Drawable drawable = dataSet.getFillDrawable(); if (drawable != null) { drawFilledPath(c, filled, drawable); } else { drawFilledPath(c, filled, dataSet.getFillColor(), dataSet.getFillAlpha()); } } iterations++; } while (currentStartIndex <= currentEndIndex); } // This method defines the perimeter of the area to be filled for horizontal bezier data sets. @Override protected void drawCubicFill(Canvas c, ILineDataSet dataSet, Path spline, Transformer trans, XBounds bounds) { final float phaseY = mAnimator.getPhaseY(); //Call the custom method to retrieve the dataset for other line final List<Entry> boundaryEntries = ((MyFillFormatter)dataSet.getFillFormatter()).getFillLineBoundary(); // We are currently at top-last point, so draw down to the last boundary point Entry boundaryEntry = boundaryEntries.get(bounds.min + bounds.range); spline.lineTo(boundaryEntry.getX(), boundaryEntry.getY() * phaseY); // Draw a cubic line going back through all the previous boundary points Entry prev = dataSet.getEntryForIndex(bounds.min + bounds.range); Entry cur = prev; for (int x = bounds.min + bounds.range; x >= bounds.min; x--) { prev = cur; cur = boundaryEntries.get(x); final float cpx = (prev.getX()) + (cur.getX() - prev.getX()) / 2.0f; spline.cubicTo( cpx, prev.getY() * phaseY, cpx, cur.getY() * phaseY, cur.getX(), cur.getY() * phaseY); } // Join up the perimeter spline.close(); trans.pathValueToPixel(spline); final Drawable drawable = dataSet.getFillDrawable(); if (drawable != null) { drawFilledPath(c, spline, drawable); } else { drawFilledPath(c, spline, dataSet.getFillColor(), dataSet.getFillAlpha()); } } // This method defines the perimeter of the area to be filled for straight-line (default) data sets. private void generateFilledPath(final ILineDataSet dataSet, final int startIndex, final int endIndex, final Path outputPath) { final float phaseY = mAnimator.getPhaseY(); final Path filled = outputPath; // Not sure if this is required, but this is done in the original code so preserving the same technique here. filled.reset(); //Call the custom method to retrieve the dataset for other line final List<Entry> boundaryEntries = ((MyFillFormatter)dataSet.getFillFormatter()).getFillLineBoundary(); final Entry entry = dataSet.getEntryForIndex(startIndex); final Entry boundaryEntry = boundaryEntries.get(startIndex); // Move down to boundary of first entry filled.moveTo(entry.getX(), boundaryEntry.getY() * phaseY); // Draw line up to value of first entry filled.lineTo(entry.getX(), entry.getY() * phaseY); // Draw line across to the values of the next entries Entry currentEntry; for (int x = startIndex + 1; x <= endIndex; x++) { currentEntry = dataSet.getEntryForIndex(x); filled.lineTo(currentEntry.getX(), currentEntry.getY() * phaseY); } // Draw down to the boundary value of the last entry, then back to the first boundary value Entry boundaryEntry1; for (int x = endIndex; x > startIndex; x--) { boundaryEntry1 = boundaryEntries.get(x); filled.lineTo(boundaryEntry1.getX(), boundaryEntry1.getY() * phaseY); } // Join up the perimeter filled.close(); } }会变得相当丑陋。

在运行测试之前，我确实创建了一个使用与测试代码相同的代码的修饰命令，但是使用Console.WriteLine来验证当前执行环境中的堆栈跟踪，而不是空委托。

在Debug中，此堆栈跟踪如下所示：

Console.WriteLine

在发布中，它看起来像这样：

   at StackOverflow.Performance.EmitMessage.Emit(String message)
   at StackOverflow.Performance.EmitMessage.MethodC(String message)
   at StackOverflow.Performance.EmitMessage.MethodB(String message)
   at StackOverflow.Performance.EmitMessage.MethodA(String message)
   at StackOverflow.Performance.EmitMessage.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.Program.Main()

到目前为止，一切看起来都很棒，而且正是我所期待的。但是，然后我通过BenchmarkDotNet执行这两个命令，以查看性能设置中的结果。这些结果似乎表明装饰命令的调用链是完整执行的，即使发出的堆栈跟踪表明不存在这样的调用链：

   at StackOverflow.Performance.EmitMessage.Emit(String message)
   at StackOverflow.Performance.Program.Main()

所以，这里似乎有超过2帧正在执行，这使我在StackOverflow上发布了这个问题。我对此有几个问题：

我的代码是否存在根本不准确的内容？这将是令人难以置信的令人尴尬，但我想首先清除掉明显的东西。：）
如果我的代码和结果确实准确，那么：这是一个已知问题吗？和/或这是按设计执行的？
我的假设是这是正在使用的尾调用优化。是不是也在这里进行内联方法？我想我的基本问题是：究竟正在使用这些意外未优化的结果进行优化？
最重要的是：无论如何都要确保并实现我想要的优化结果？传递给根代表的任何魔法在这里都很有价值。似乎根代理是已正确解析，而不是正确调用。

为了完整起见，以下是运行此示例的所有代码：

// * Summary *

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.16299.371 (1709/FallCreatorsUpdate/Redstone3)
Intel Core i7-4820K CPU 3.70GHz (Haswell), 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=2.1.300-preview2-008533
  [Host]     : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT


    Method |      Mean |     Error |    StdDev |
---------- |----------:|----------:|----------:|
    Direct |  3.581 ns | 0.0759 ns | 0.0710 ns |
 Decorated | 44.646 ns | 0.7701 ns | 0.7203 ns |

提前感谢您提供的任何见解/帮助！

Answer 1

从here

无耻地复制Stephen Toub的作品

我刚看了一下装饰委员会的反汇编程序使用核对的coreclr构建并使用setCOMPlus_JitDisasm=Execute运行，请参阅documentation。实际上它正在使用尾调用：


方法DecoratedCommand:Execute(ref)的汇编列表：此


使用AVX发送X64 CPU的BLENDED_CODE

优化代码

基于rsp的框架

完全可以中断

最终的局部变量分配

V00 [V00，T00]（3,3）ref - > rcx这个类-hnd

V01 arg1 [V01，T01]（3,3）ref - ＆gt; rdx class-hnd

;＃V02 OutArgs [V02]（1,1）lclBlk（0）[rsp + 0x00]

Lcl帧大小= 0

G_M223_IG01：

G_M223_IG02：

488B4908 mov rcx，gword ptr [rcx + 8]

49BB48007733FD7F0000 mov r11,0x7FFD33770048

488B05934FE5FF mov rax，qword ptr [（reloc）]

3909 cmp dword ptr [rcx]，ecx

G_M223_IG03：

48FFE0 rex.jmp rax

JIT优化：为什么它会变慢，我该如何改进它？

1 个答案: