我有一个将非托管分配包装到数组中的类。您可以看到source here on Github,但这是它的主要要旨:
public unsafe class ArrayReference<T> : Reference, IArrayReference<T>
where T : unmanaged
{
private T* typedPointer_;
public T this[ int index ]
{
[MethodImpl( MethodImplOptions.AggressiveInlining )]
get => typedPointer_[ index ];
[MethodImpl( MethodImplOptions.AggressiveInlining )]
set => typedPointer_[ index ] = value;
}
}
这非常简单,对于读取操作,它提供了出色的性能(以BenchmarkDotNet衡量):
Array size: 128
Managed byte[] ranged-for get: 69.8046 ns
ArrayReference ranged-for get: 66.7340 ns
Managed byte[] ranged-for set: 66.1855 ns
ArrayReference ranged-for set: 68.4863 ns
以及基准代码:
[GlobalSetup]
public void Setup()
{
median = AllocationSize / 2;
alloc_ = new Allocation( AllocationSize );
array_ = new ArrayReference<byte>( alloc_.Address, AllocationSize );
managedArray_ = new byte[ AllocationSize ];
}
[Benchmark]
public void ManagedArray_ranged_for_get()
{
var counter = 0;
for ( var i = 0; i < AllocationSize; i++ )
counter += managedArray_[ i ];
}
[Benchmark]
public void ArrayReference_ranged_for_get()
{
var counter = 0;
for ( var i = 0; i < AllocationSize; i++ )
counter += array_[ i ];
}
[Benchmark]
public void ManagedArray_ranged_for_set()
{
for ( var i = 0; i < AllocationSize; i++ )
managedArray_[ i ] = ( byte ) i;
}
[Benchmark]
public void ArrayReference_ranged_for_set()
{
for ( var i = 0; i < AllocationSize; i++ )
array_[ i ] = ( byte ) i;
}
如您所见,从ArrayReference
读取数据的速度稍快,因为它不执行范围检查,并且可以直接访问数组的指针。但是,与在托管ArrayReference
数组中写入相比,在byte[]
中写入的速度要慢,并且看来问题在于未插入设置程序。
针对托管字节[]的JIT x86设置:
managedArray_[ median ] = 0;
00007FFA237A216C mov rax,qword ptr [rbp+10h]
00007FFA237A2170 mov rax,qword ptr [rax+18h]
00007FFA237A2174 mov rdx,qword ptr [rbp+10h]
00007FFA237A2178 mov edx,dword ptr [rdx+24h]
00007FFA237A217B cmp rdx,qword ptr [rax+8]
00007FFA237A217F jb 00007FFA237A2186
00007FFA237A2181 call 00007FFA833DF110
00007FFA237A2186 lea rax,[rax+rdx+10h]
00007FFA237A218B mov byte ptr [rax],0
用于ArrayReference :: set:的JIT x86
typedPointer_[ index ] = value;
00007FFA237A20BC mov rcx,qword ptr [rbp+10h]
00007FFA237A20C0 mov rcx,qword ptr [rcx+10h]
00007FFA237A20C4 mov rdx,qword ptr [rbp+10h]
00007FFA237A20C8 mov edx,dword ptr [rdx+24h]
00007FFA237A20CB xor r8d,r8d
00007FFA237A20CE cmp dword ptr [rcx],ecx
00007FFA237A20D0 call 00007FFA237A1738
->
00007FFA237A20F0 push rbp
00007FFA237A20F1 sub rsp,20h
00007FFA237A20F5 lea rbp,[rsp+20h]
00007FFA237A20FA mov qword ptr [rbp+10h],rcx
00007FFA237A20FE mov dword ptr [rbp+18h],edx
00007FFA237A2101 mov dword ptr [rbp+20h],r8d
00007FFA237A2105 cmp dword ptr [7FFA23688310h],0
00007FFA237A210C je 00007FFA237A2113
00007FFA237A210E call 00007FFA833DD3E0
00007FFA237A2113 mov rax,qword ptr [rbp+10h]
00007FFA237A2117 mov rax,qword ptr [rax+20h]
00007FFA237A211B mov edx,dword ptr [rbp+18h]
00007FFA237A211E movsxd rdx,edx
00007FFA237A2121 mov ecx,1
00007FFA237A2126 movsxd rcx,ecx
00007FFA237A2129 imul rdx,rcx
00007FFA237A212D mov ecx,dword ptr [rbp+20h]
00007FFA237A2130 mov byte ptr [rax+rdx],cl
我不明白为什么没有内联。它与托管数组完全一样,只是使用了指向非托管内存的指针。这是否违反了CLR的内联规则之一,还是因为T
是通用的,即使它受到约束,也可能没有被内联?
Windows 10 Pro(64位) .Net Core 2.2,发布模式64位RyuJIT