我有一个核心C#函数,我正在努力加快速度。涉及安全或不安全代码的建议同样受欢迎。这是方法:
public byte[] Interleave(uint[] vector)
{
var byteVector = new byte[BytesNeeded + 1]; // Extra byte needed when creating a BigInteger, for sign bit.
foreach (var idx in PrecomputedIndices)
{
var bit = (byte)(((vector[idx.iFromUintVector] >> idx.iFromUintBit) & 1U) << idx.iToByteBit);
byteVector[idx.iToByteVector] |= bit;
}
return byteVector;
}
PrecomputedIndices是以下类的数组:
class Indices
{
public readonly int iFromUintVector;
public readonly int iFromUintBit;
public readonly int iToByteVector;
public readonly int iToByteBit;
public Indices(int fromUintVector, int fromUintBit, int toByteVector, int toByteBit)
{
iFromUintVector = fromUintVector;
iFromUintBit = fromUintBit;
iToByteVector = toByteVector;
iToByteBit = toByteBit;
}
}
Interleave方法的目的是将uints数组中的位复制到字节数组。我已经预先计算了源和目标数组索引以及源和目标位数,并将它们存储在Indices对象中。源中的两个相邻位在目标中不会相邻,因此排除了某些优化。为了让您了解规模,我正在处理的问题大约有4,200个维度,因此“vector”有4,200个元素。向量中的值范围从0到12,因此我只需要使用4位将它们的值存储在字节数组中,因此我需要4,200 x 4 = 16,800位数据,或每个向量2,100字节的输出。这种方法将被调用数百万次。在我需要优化的较大程序中,它消耗了大约三分之一的时间。
更新1:将“Indices”更改为结构并缩小一些数据类型,使对象只有8个字节(int,short和2个字节),将执行时间的百分比从35%减少到30 %。
答案 0 :(得分:0)
这些是我修订实施的关键部分,其中的意见来自评论者:
1)将对象转换为struct,将数据类型缩小为较小的int,然后重新排列,以使对象适合64位值,这对于64位计算机更好:
struct Indices
{
/// <summary>
/// Index into source vector of source uint to read.
/// </summary>
public readonly int iFromUintVector;
/// <summary>
/// Index into target vector of target byte to write.
/// </summary>
public readonly short iToByteVector;
/// <summary>
/// Index into source uint of source bit to read.
/// </summary>
public readonly byte iFromUintBit;
/// <summary>
/// Index into target byte of target bit to write.
/// </summary>
public readonly byte iToByteBit;
public Indices(int fromUintVector, byte fromUintBit, short toByteVector, byte toByteBit)
{
iFromUintVector = fromUintVector;
iFromUintBit = fromUintBit;
iToByteVector = toByteVector;
iToByteBit = toByteBit;
}
}
2)对PrecomputedIndices进行排序,以便按升序编写每个目标字节和位,从而改善内存缓存访问:
Comparison<Indices> sortByTargetByteAndBit = (a, b) =>
{
if (a.iToByteVector < b.iToByteVector) return -1;
if (a.iToByteVector > b.iToByteVector) return 1;
if (a.iToByteBit < b.iToByteBit) return -1;
if (a.iToByteBit > b.iToByteBit) return 1;
return 0;
};
Array.Sort(PrecomputedIndices, sortByTargetByteAndBit);
3)展开循环,以便立即汇编整个目标字节,减少我访问目标数组的次数:
public byte[] Interleave(uint[] vector)
{
var byteVector = new byte[BytesNeeded + 1]; // An extra byte is needed to hold the extra bits and a sign bit for the BigInteger.
var extraBits = Bits - BytesNeeded << 3;
int iIndex = 0;
var iByte = 0;
for (; iByte < BytesNeeded; iByte++)
{
// Unroll the loop so we compute the bits for a whole byte at a time.
uint bits = 0;
var idx0 = PrecomputedIndices[iIndex];
var idx1 = PrecomputedIndices[iIndex + 1];
var idx2 = PrecomputedIndices[iIndex + 2];
var idx3 = PrecomputedIndices[iIndex + 3];
var idx4 = PrecomputedIndices[iIndex + 4];
var idx5 = PrecomputedIndices[iIndex + 5];
var idx6 = PrecomputedIndices[iIndex + 6];
var idx7 = PrecomputedIndices[iIndex + 7];
bits = (((vector[idx0.iFromUintVector] >> idx0.iFromUintBit) & 1U))
| (((vector[idx1.iFromUintVector] >> idx1.iFromUintBit) & 1U) << 1)
| (((vector[idx2.iFromUintVector] >> idx2.iFromUintBit) & 1U) << 2)
| (((vector[idx3.iFromUintVector] >> idx3.iFromUintBit) & 1U) << 3)
| (((vector[idx4.iFromUintVector] >> idx4.iFromUintBit) & 1U) << 4)
| (((vector[idx5.iFromUintVector] >> idx5.iFromUintBit) & 1U) << 5)
| (((vector[idx6.iFromUintVector] >> idx6.iFromUintBit) & 1U) << 6)
| (((vector[idx7.iFromUintVector] >> idx7.iFromUintBit) & 1U) << 7);
byteVector[iByte] = (Byte)bits;
iIndex += 8;
}
for (; iIndex < PrecomputedIndices.Length; iIndex++)
{
var idx = PrecomputedIndices[iIndex];
var bit = (byte)(((vector[idx.iFromUintVector] >> idx.iFromUintBit) & 1U) << idx.iToByteBit);
byteVector[idx.iToByteVector] |= bit;
}
return byteVector;
}
总节省:44%!!!