Question

对于非SSE代码，如以下问题（No overflow exception for int in C#?）所述，在加法周围添加checked部分，则在添加Int64.MaxValue和1时会引发溢出异常。但是，围绕SSE加法checked部分中的内容似乎没有引发long[] arrLong = new long[] { 5, 7, 16, Int64.MaxValue, 3, 1 };的溢出异常。我相信大多数SSE指令都使用饱和数学，它们达到Int64.MaxValue且不会超过饱和并且永远不会绕负。在C＃中有什么方法可以引发SSE加法的溢出异常，还是因为CPU可能不支持引发溢出标志而无法实现？

以下代码显示了我使用SSE对long []求和的C＃SSE实现。结果是上面的数组为负数，因为正数环绕且不饱和，因为C＃必须使用该版本的SSE指令（因为有两种版本：一种是环绕的，一种是饱和的）。不知道C＃是否允许开发人员选择要使用的版本。以下代码中只有串行代码部分会引发溢出异常，而SSE部分则不会。

    using System.Numerics;

    private static long SumSseInner(this long[] arrayToSum, int l, int r)
    {
        var sumVector = new Vector<long>();
        int sseIndexEnd = l + ((r - l + 1) / Vector<long>.Count) * Vector<long>.Count;
        int i;
        for (i = l; i < sseIndexEnd; i += Vector<long>.Count)
        {
            var inVector = new Vector<long>(arrayToSum, i);
            checked
            {
                sumVector += inVector;
            }
        }
        long overallSum = 0;
        for (; i <= r; i++)
        {
            checked
            {
                overallSum += arrayToSum[i];
            }
        }
        for (i = 0; i < Vector<long>.Count; i++)
        {
            checked
            {
                overallSum += sumVector[i];
            }
        }
        return overallSum;
    }

Answer 1

以下是在C＃中使用SSE的ulong求和的实现。我要发布它，因为它比冗长的总和要短而且容易理解。

private static decimal SumToDecimalSseFasterInner(this ulong[] arrayToSum, int l, int r)
{
    decimal overallSum = 0;
    var sumVector    = new Vector<ulong>();
    var newSumVector = new Vector<ulong>();
    var zeroVector   = new Vector<ulong>(0);
    int sseIndexEnd = l + ((r - l + 1) / Vector<ulong>.Count) * Vector<ulong>.Count;
    int i;

    for (i = l; i < sseIndexEnd; i += Vector<ulong>.Count)
    {
        var inVector = new Vector<ulong>(arrayToSum, i);
        newSumVector = sumVector + inVector;
        Vector<ulong> gteMask = Vector.GreaterThanOrEqual(newSumVector, sumVector);         // if true then 0xFFFFFFFFFFFFFFFFL else 0L at each element of the Vector<long>
        if (Vector.EqualsAny(gteMask, zeroVector))
        {
            for(int j = 0; j < Vector<ulong>.Count; j++)
            {
                if (gteMask[j] == 0)    // this particular sum overflowed, since sum decreased
                {
                    overallSum += sumVector[j];
                    overallSum += inVector[ j];
                }
            }
        }
        sumVector = Vector.ConditionalSelect(gteMask, newSumVector, zeroVector);
    }
    for (; i <= r; i++)
        overallSum += arrayToSum[i];
    for (i = 0; i < Vector<ulong>.Count; i++)
        overallSum += sumVector[i];
    return overallSum;
}

使用SSE并将ulong []和long []求和并累加到Decimal，以产生完全准确的结果，这两种方法都已添加到我维护的HPCsharp nuget程序包中（开源）。 long []的版本位于SumParallel.cs中，称为SumToDecimalSseFasterInner（）。

能够使用SSE对long []或ulong []数组求和，处理SSE中的算术溢出是很酷的，因为CPU不会为SSE产生溢出标志，而是以SSE速度进行处理，并且核心！

在C＃中添加长时间/冗长的SSE是否会引发溢出异常？

1 个答案: