我有类似以下的功能:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void SetVariable<T>(T newValue) where T : struct {
// I know by this point that T is blittable (i.e. only unmanaged value types)
// varPtr is a void*, and is where I want to copy newValue to
*varPtr = newValue; // This won't work, but is basically what I want to do
}
我看过Marshal.StructureToIntPtr(),但看起来很慢,这是对性能敏感的代码。如果我知道T
类型,我可以将varPtr
声明为T*
,但是......好吧,我不会。
无论哪种方式,我都是以最快的方式完成此任务。 &#39;安全&#39;不是问题:在代码的这一点上,我知道结构T
的大小将完全适合varPtr
指向的内存。
答案 0 :(得分:3)
一个答案是在C#中重新实现本机memcpy,使用与本机memcpy尝试相同的优化技巧。您可以看到Microsoft在自己的源代码中执行此操作。请参阅Microsoft参考源中的Buffer.cs文件:
// This is tricky to get right AND fast, so lets make it useful for the whole Fx.
// E.g. System.Runtime.WindowsRuntime!WindowsRuntimeBufferExtensions.MemCopy uses it.
internal unsafe static void Memcpy(byte* dest, byte* src, int len) {
// This is portable version of memcpy. It mirrors what the hand optimized assembly versions of memcpy typically do.
// Ideally, we would just use the cpblk IL instruction here. Unfortunately, cpblk IL instruction is not as efficient as
// possible yet and so we have this implementation here for now.
switch (len)
{
case 0:
return;
case 1:
*dest = *src;
return;
case 2:
*(short *)dest = *(short *)src;
return;
case 3:
*(short *)dest = *(short *)src;
*(dest + 2) = *(src + 2);
return;
case 4:
*(int *)dest = *(int *)src;
return;
...
有趣的是,它们本身实现了所有大小达到512的memcpy;大多数大小使用指针别名技巧来让VM发出操作不同大小的指令。只有在512,他们最终才会调用本机memcpy:
// P/Invoke into the native version for large lengths
if (len >= 512)
{
_Memcpy(dest, src, len);
return;
}
据推测,本机memcpy更快,因为它可以手动优化以使用SSE / MMX指令来执行复制。
答案 1 :(得分:2)
根据BenVoigt的建议,我尝试了一些选择。对于所有这些测试,我使用Any CPU架构在标准的VS2013 Release版本上编译,并在IDE外部运行测试。在测量每个测试之前,方法DoTestA()
和DoTestB()
被多次运行以允许JIT预热。
首先,我将Marshal.StructToPtr
与具有各种结构大小的逐字节循环进行比较。我使用SixtyFourByteStruct
:
private unsafe static void DoTestA() {
fixed (SixtyFourByteStruct* fixedStruct = &structToCopy) {
byte* structStart = (byte*) fixedStruct;
byte* targetStart = (byte*) unmanagedTarget;
for (byte* structPtr = structStart, targetPtr = targetStart; structPtr < structStart + sizeof(SixtyFourByteStruct); ++structPtr, ++targetPtr) {
*targetPtr = *structPtr;
}
}
}
private static void DoTestB() {
Marshal.StructureToPtr(structToCopy, unmanagedTarget, false);
}
结果:
>>> 500000 repetitions >>> IN NANOSECONDS (1000ns = 0.001ms)
Method Avg. Min. Max. Jitter Total
A 82ns 0ns 22,000ns 21,917ns ! 41.017ms
B 137ns 0ns 38,700ns 38,562ns ! 68.834ms
如您所见,手动循环更快(我怀疑)。对于16字节和4字节结构,结果类似,结构越小,差异越明显。
现在,尝试手动复制与使用P / Invoke和memcpy:
private unsafe static void DoTestA() {
fixed (FourByteStruct* fixedStruct = &structToCopy) {
byte* structStart = (byte*) fixedStruct;
byte* targetStart = (byte*) unmanagedTarget;
for (byte* structPtr = structStart, targetPtr = targetStart; structPtr < structStart + sizeof(FourByteStruct); ++structPtr, ++targetPtr) {
*targetPtr = *structPtr;
}
}
}
private unsafe static void DoTestB() {
fixed (FourByteStruct* fixedStruct = &structToCopy) {
memcpy(unmanagedTarget, (IntPtr) fixedStruct, new UIntPtr((uint) sizeof(FourByteStruct)));
}
}
>>> 500000 repetitions >>> IN NANOSECONDS (1000ns = 0.001ms)
Method Avg. Min. Max. Jitter Total
A 61ns 0ns 28,000ns 27,938ns ! 30.736ms
B 84ns 0ns 45,900ns 45,815ns ! 42.216ms
所以,在我的情况下,似乎手动副本仍然更好。与之前一样,4/16/64字节结构的结果非常相似(尽管64字节大小的差距小于10ns)。
我想到我只测试适合缓存行的结构(我有一个标准的x86_64 CPU)。所以我尝试了一个128字节的结构,并且它有利于memcpy:
>>> 500000 repetitions >>> IN NANOSECONDS (1000ns = 0.001ms)
Method Avg. Min. Max. Jitter Total
A 104ns 0ns 48,300ns 48,195ns ! 52.150ms
B 84ns 0ns 38,400ns 38,315ns ! 42.284ms
无论如何,对于我的机器上的x86_64 CPU上任何大小为&lt; = 64字节的结构,逐字节复制似乎是最快的结论。按照你的意愿(也许有人会发现我的代码效率低下)。
答案 2 :(得分:0)
FYI。我发布了如何将the accepted answer用于其他人&#39;通过反射访问方法时因为它过载而受到影响。
public static class Buffer
{
public unsafe delegate void MemcpyDelegate(byte* dest, byte* src, int len);
public static readonly MemcpyDelegate Memcpy;
static Buffer()
{
var methods = typeof (System.Buffer).GetMethods(BindingFlags.Static | BindingFlags.NonPublic).Where(m=>m.Name == "Memcpy");
var memcpy = methods.First(mi => mi.GetParameters().Select(p => p.ParameterType).SequenceEqual(new[] {typeof (byte*), typeof (byte*), typeof (int)}));
Memcpy = (MemcpyDelegate) memcpy.CreateDelegate(typeof (MemcpyDelegate));
}
}
用法:
public static unsafe void MemcpyExample()
{
int src = 12345;
int dst = 0;
Buffer.Memcpy((byte*) &dst, (byte*) &src, sizeof (int));
System.Diagnostics.Debug.Assert(dst==12345);
}
答案 3 :(得分:-1)
public void SetVariable<T>(T newValue) where T : struct
您不能使用泛型来快速完成此操作。编译器并没有把你那漂亮的蓝眼睛作为T实际上是blittable的保证,约束不够好。你应该使用重载:
public unsafe void SetVariable(int newValue) {
*(int*)varPtr = newValue;
}
public unsafe void SetVariable(double newValue) {
*(double*)varPtr = newValue;
}
public unsafe void SetVariable(Point newValue) {
*(Point*)varPtr = newValue;
}
// etc...
这可能不方便,但速度快。它编译为单个MOV指令,在释放模式下没有方法调用开销。它可能是最快的。
在后备案例中,探查器会告诉您何时需要重载:
public unsafe void SetVariable<T>(T newValue) {
Marshal.StructureToPtr(newValue, (IntPtr)varPtr, false);
}