为什么不内联索引器属性设置器?

时间:2019-04-01 22:07:04

标签: c# performance generics unmanaged

我有一个将非托管分配包装到数组中的类。您可以看到source here on Github,但这是它的主要要旨:

public unsafe class ArrayReference<T> : Reference, IArrayReference<T>
  where T : unmanaged
{

  private T* typedPointer_;

  public T this[ int index ]
  {
    [MethodImpl( MethodImplOptions.AggressiveInlining )]
    get => typedPointer_[ index ];
    [MethodImpl( MethodImplOptions.AggressiveInlining )]
    set => typedPointer_[ index ] = value;
  }

}

这非常简单,对于读取操作,它提供了出色的性能(以BenchmarkDotNet衡量):

Array size: 128
Managed byte[] ranged-for get: 69.8046 ns
ArrayReference ranged-for get: 66.7340 ns
Managed byte[] ranged-for set: 66.1855 ns
ArrayReference ranged-for set: 68.4863 ns

以及基准代码:

[GlobalSetup]
public void Setup()
{
  median = AllocationSize / 2;
  alloc_ = new Allocation( AllocationSize );
  array_ = new ArrayReference<byte>( alloc_.Address, AllocationSize );
  managedArray_ = new byte[ AllocationSize ];
}

[Benchmark]
public void ManagedArray_ranged_for_get()
{
  var counter = 0;
  for ( var i = 0; i < AllocationSize; i++ )
    counter += managedArray_[ i ];
}

[Benchmark]
public void ArrayReference_ranged_for_get()
{
  var counter = 0;
  for ( var i = 0; i < AllocationSize; i++ )
    counter += array_[ i ];
}

[Benchmark]
public void ManagedArray_ranged_for_set()
{
  for ( var i = 0; i < AllocationSize; i++ )
    managedArray_[ i ] = ( byte ) i;
}

[Benchmark]
public void ArrayReference_ranged_for_set()
{
  for ( var i = 0; i < AllocationSize; i++ )
    array_[ i ] = ( byte ) i;
}

如您所见,从ArrayReference读取数据的速度稍快,因为它不执行范围检查,并且可以直接访问数组的指针。但是,与在托管ArrayReference数组中写入相比,在byte[]中写入的速度要慢,并且看来问题在于未插入设置程序。

针对托管字节[]的JIT x86设置:

managedArray_[ median ] = 0;
00007FFA237A216C  mov         rax,qword ptr [rbp+10h]  
00007FFA237A2170  mov         rax,qword ptr [rax+18h]  
00007FFA237A2174  mov         rdx,qword ptr [rbp+10h]  
00007FFA237A2178  mov         edx,dword ptr [rdx+24h]  
00007FFA237A217B  cmp         rdx,qword ptr [rax+8]  
00007FFA237A217F  jb          00007FFA237A2186  
00007FFA237A2181  call        00007FFA833DF110  
00007FFA237A2186  lea         rax,[rax+rdx+10h]  
00007FFA237A218B  mov         byte ptr [rax],0  

用于ArrayReference :: set:的JIT x86

typedPointer_[ index ] = value;
00007FFA237A20BC  mov         rcx,qword ptr [rbp+10h]  
00007FFA237A20C0  mov         rcx,qword ptr [rcx+10h]  
00007FFA237A20C4  mov         rdx,qword ptr [rbp+10h]  
00007FFA237A20C8  mov         edx,dword ptr [rdx+24h]  
00007FFA237A20CB  xor         r8d,r8d  
00007FFA237A20CE  cmp         dword ptr [rcx],ecx  
00007FFA237A20D0  call        00007FFA237A1738  
 -> 
    00007FFA237A20F0  push        rbp  
    00007FFA237A20F1  sub         rsp,20h  
    00007FFA237A20F5  lea         rbp,[rsp+20h]  
    00007FFA237A20FA  mov         qword ptr [rbp+10h],rcx  
    00007FFA237A20FE  mov         dword ptr [rbp+18h],edx  
    00007FFA237A2101  mov         dword ptr [rbp+20h],r8d  
    00007FFA237A2105  cmp         dword ptr [7FFA23688310h],0  
    00007FFA237A210C  je          00007FFA237A2113  
    00007FFA237A210E  call        00007FFA833DD3E0  
    00007FFA237A2113  mov         rax,qword ptr [rbp+10h]  
    00007FFA237A2117  mov         rax,qword ptr [rax+20h]  
    00007FFA237A211B  mov         edx,dword ptr [rbp+18h]  
    00007FFA237A211E  movsxd      rdx,edx  
    00007FFA237A2121  mov         ecx,1  
    00007FFA237A2126  movsxd      rcx,ecx  
    00007FFA237A2129  imul        rdx,rcx  
    00007FFA237A212D  mov         ecx,dword ptr [rbp+20h]  
    00007FFA237A2130  mov         byte ptr [rax+rdx],cl  

我不明白为什么没有内联。它与托管数组完全一样,只是使用了指向非托管内存的指针。这是否违反了CLR的内联规则之一,还是因为T是通用的,即使它受到约束,也可能没有被内联?

Windows 10 Pro(64位) .Net Core 2.2,发布模式64位RyuJIT

0 个答案:

没有答案