Question

我最近制作了一个基准测试应用程序，以探索一些方法来为C＃中的数学结构编写加法运算符：https://github.com/nickgravelyn/math-struct-benchmark。在结果中，我发现Vector2始终比Vector3慢，尽管它的数据更少且指令更少。更有趣的是，在我测试的每个运行时/ JIT中似乎都是这种情况。

例如，在.NET Core 2.2上运行时，one of the tested Vector2实现的+运算符的基准测试花费921.82 ms，而comparable Vector3实现的实现则花费422.76 ms。

C＃，IL或本机程序集是否有某些原因可以解释为什么我会看到这些结果？还是我在基准测试中弄乱了一些我似乎无法发现的东西？

Answer 1

更多挖掘之后，发现64位RyuJIT代码生成是一个问题。我在CoreCLR上有一个issue filed，看来这与某些其他性能问题相关或相同。

Answer 2

我尝试添加一些发现，尽管到目前为止还没有答案。 BenchmarkDotNet向我显示了与您相同的结果。

首先，我使用工具在VS中进行了概要分析，因此毫无疑问，加法本身就是在浪费时间并带来巨大的差异。

执行的64位代码的结果：

vs。 32位：

这两行的IL代码是这样的：

        // value += value2;
    IL_0059: ldloc.0
    IL_005a: ldloc.1
    IL_005b: call valuetype UserQuery/Vector2_A UserQuery/Vector2_A::op_Addition(valuetype UserQuery/Vector2_A, valuetype UserQuery/Vector2_A)
    IL_0060: stloc.0
    // value3 += value4;
    IL_0061: ldloc.2
    IL_0062: ldloc.3
    IL_0063: call valuetype UserQuery/Vector3_A UserQuery/Vector3_A::op_Addition(valuetype UserQuery/Vector3_A, valuetype UserQuery/Vector3_A)
    IL_0068: stloc.2

接着是2种add操作方法，即2种dim：

.method public hidebysig specialname static 
valuetype UserQuery/Vector2_A op_Addition (
    valuetype UserQuery/Vector2_A value1,
    valuetype UserQuery/Vector2_A value2
) cil managed 
{
// Method begins at RVA 0x2100
// Code size 37 (0x25)
.maxstack 3
.locals init (
    [0] valuetype UserQuery/Vector2_A
)

// (no C# code)
IL_0000: nop
// return new Vector2_A(value1.X + value2.X, value1.Y + value2.Y);
IL_0001: ldarg.0
IL_0002: ldfld float32 UserQuery/Vector2_A::X
IL_0007: ldarg.1
IL_0008: ldfld float32 UserQuery/Vector2_A::X
IL_000d: add
IL_000e: ldarg.0
IL_000f: ldfld float32 UserQuery/Vector2_A::Y
IL_0014: ldarg.1
IL_0015: ldfld float32 UserQuery/Vector2_A::Y
IL_001a: add
IL_001b: newobj instance void UserQuery/Vector2_A::.ctor(float32, float32)
IL_0020: stloc.0
// (no C# code)
IL_0021: br.s IL_0023

IL_0023: ldloc.0
IL_0024: ret
} // end of method Vector2_A::op_Addition

和三维空间：

.method public hidebysig specialname static 
valuetype UserQuery/Vector3_A op_Addition (
    valuetype UserQuery/Vector3_A value1,
    valuetype UserQuery/Vector3_A value2
) cil managed 
{
// Method begins at RVA 0x214c
// Code size 50 (0x32)
.maxstack 4
.locals init (
    [0] valuetype UserQuery/Vector3_A
)

// (no C# code)
IL_0000: nop
// return new Vector3_A(value1.X + value2.X, value1.Y + value2.Y, value1.Z + value2.Z);
IL_0001: ldarg.0
IL_0002: ldfld float32 UserQuery/Vector3_A::X
IL_0007: ldarg.1
IL_0008: ldfld float32 UserQuery/Vector3_A::X
IL_000d: add
IL_000e: ldarg.0
IL_000f: ldfld float32 UserQuery/Vector3_A::Y
IL_0014: ldarg.1
IL_0015: ldfld float32 UserQuery/Vector3_A::Y
IL_001a: add
IL_001b: ldarg.0
IL_001c: ldfld float32 UserQuery/Vector3_A::Z
IL_0021: ldarg.1
IL_0022: ldfld float32 UserQuery/Vector3_A::Z
IL_0027: add
IL_0028: newobj instance void UserQuery/Vector3_A::.ctor(float32, float32, float32)
IL_002d: stloc.0
// (no C# code)
IL_002e: br.s IL_0030

IL_0030: ldloc.0
IL_0031: ret
} // end of method Vector3_A::op_Addition

说实话，剩下的只是猜测3 dim add方法在mem /堆栈对齐方面有一些优势，因为它指出代码大小为0x32 vs. 0x25和maxstack 4 vs. 3。

检查RjuJIT的x64汇编器结果将使我到目前为止用尽所有天赋。也许值得为此征询一位MS的JIT专家？

为什么2D向量结构的添加速度比C＃中的3D向量结构慢？

2 个答案: