Question

是否有比BitConverter.ToInt32更快的方法将字节数组转换为int值？

Answer 1

我实际上尝试了几种不同的方法将四个字节转换为int：

BitConverter.ToInt32(new byte[] { w, x, y, z }, 0);
BitConverter.ToUInt32(new byte[] { w, x, y, z }, 0);
b = new byte[] { w, x, y, z }; BitConverter.ToInt32(b, 0);
b = new byte[] { 1, 2, 3, 4, 5, 6, 7, w, x, y, z }; BitConverter.ToInt32(b, 7);
w | (x << 8) | (y << 16) | (z << 24);
b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);

我在Release（x86）版本中运行了10 ^ 9次迭代，而不是在2.5 GHz Core i7笔记本电脑上的调试器下运行。以下是我的结果（请注意，不使用BitConverter的方法要快得多）：

test1: 00:00:15.5287282 67305985
test2: 00:00:15.1334457 67305985
test3: 00:00:08.0648586 67305985
test4: 00:00:11.2307059 67305985
test5: 00:00:02.0219417 67305985
test6: 00:00:01.6275684 67305985

您可以得出一些结论：

test1表明，在我的笔记本电脑上很难让转换速度低于15ns，我不想说这对任何人都应该足够快。（你需要每秒调用超过60M次吗？）
test2表明使用uint代替int可以节省少量时间。我不确定为什么，但我认为它足够小，可以成为实验性错误。
test3表明创建一个新字节数组（7ns）的开销几乎与调用该函数一样多，但仍然比从旧数组中创建一个新数组更快。
test4显示从ToInt32进行未对齐的数组访问会增加开销（3ns）
test5显示从局部变量中提取4个字节并自己组合它比调用ToInt32快几倍。
test6表明，从数组中提取4个字节实际上比从函数参数提取的速度稍快一些！我怀疑这是由于CPU流水线或缓存效应造成的。

最快的test6只用了两倍的空循环（未显示）。换句话说，执行每次转换所需的时间不到1ns。祝你好运，获得任何有用的计算都要比这更快！

这是我的测试程序：

using System;

namespace BitConverterTest
{
    class Program
    {
        const int iters = 1000000000;
        static void Main(string[] args)
        {
            test1(1, 2, 3, 4);
            test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);
        }

        static void test1(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(new byte[] { w, x, y, z }, 0);
            Console.WriteLine("test1: " + timer.Elapsed + " " + res);
        }

        static void test2(byte w, byte x, byte y, byte z)
        {
            uint res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToUInt32(new byte[] { w, x, y, z }, 0);
            Console.WriteLine("test2: " + timer.Elapsed + " " + res);
        }

        static void test3(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        static void test4(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { 1, 2, 3, 4, 5, 6, 7, w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 7);
            Console.WriteLine("test4: " + timer.Elapsed + " " + res);
        }

        static void test5(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = w | (x << 8) | (y << 16) | (z << 24);
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        static void test6(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }
    }
}

Answer 2

如果我没记错的话，那个实现使用了不安全的代码（将一个字节*视为一个int *），所以它很难被击败，但另一种选择是转移。

然而，从这个领域的很多的工作来看，这不太可能是一个真正的瓶颈，无关紧要。 I / O通常是主要问题。

GetBytes（int），但由于数组/堆分配，更昂贵（大批量）。

Answer 3

跟进Gabe's性能测试：

的变化：

消除测试1和2，因为内联阵列创建了GC的这些测试（从Gen 0 GC性能计数器可以看出）。
消除测试4（非对齐阵列）以使事情更简单。
添加测试7和8，它们分别通过BitConverter和bit fiddling从大型阵列（256 MB）进行转换。
在测试中添加运行总计，以避免常见的子表达式消除，这显然导致Gabe测试5和6的次数较少。

结果：

32位选项：

test3: 00:00:06.9230577
test5: 00:00:03.8349386
test6: 00:00:03.8238272
test7: 00:00:07.3898489
test8: 00:00:04.6807391

64位选项：

test3: 00:00:05.8794322
test5: 00:00:00.4384600
test6: 00:00:00.4069573
test7: 00:00:06.2279365
test8: 00:00:03.5472486

分析

在64位上仍然可以在5和6中消除常见的子表达式。
这个64位是胜利。但是，不应该选择这样的微基准来选择优化应用的位置。
将256 MB随机数据转换为int时，看起来大约有50％的改进。由于测试的次数是16次，而不是0.2次 - 不太可能在非常狭窄的应用程序子集之外产生真正的差异，然后你需要额外的维护成本来确保有人在应用程序生命周期内不会破坏代码
我想知道参数检查有多少BitConverter开销？
测试6只比5快一点。显然，阵列边界检查正在消除。

守则

using System;

namespace BitConverterTest {
    class Program {
        const int iters = 1024*1024*1024;
        const int arrayLen = iters/4;
        static byte[] array = new byte[arrayLen];

        static void Main(string[] args) {
            //test1(1, 2, 3, 4);
            //test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            //test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);

            // Fill array with good PRNG data
            var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
            rng.GetBytes(array);

            test7();
            test8();
        }

        // BitConverter with aligned input
        static void test3(byte w, byte x, byte y, byte z) {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with separate variables.
        static void test5(byte w, byte x, byte y, byte z) {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++) {
                int a = w | (x << 8) | (y << 16) | (z << 24);
                res += a;
            }
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with array elements.
        static void test6(byte w, byte x, byte y, byte z) {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++) {
                int a = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
                res += a;
            }
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }

        // BitConvert from large array...
        static void test7() {
            var its = iters/arrayLen * 4; // *4 to remove arrayLen/4 factor.
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++) {
                for (var pos = 0; pos < arrayLen; pos += 4) {
                    var x = BitConverter.ToInt32(array, pos);
                    res += x;
                }
            }
            Console.WriteLine("test7: " + timer.Elapsed + " " + res);
        }

        // Bitfiddle from large array...
        static void test8() {
            var its = iters/arrayLen * 4;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++) {
                for (var pos = 0; pos < arrayLen; pos += 4) {
                    int x = array[pos] | (array[pos+1] << 8) | (array[pos+2] << 16) | (array[pos+3] << 24);
                    res += x;
                }
            }
            Console.WriteLine("test8: " + timer.Elapsed + " " + res);
        }
    }
}

Answer 4

基于对.NET Reflector中BitConverter.ToInt32实现的快速回顾，我会说“否”。

它优化了数组对齐并直接转换字节的情况，否则它执行按位合并。

Answer 5

我也摆弄了类似的问题。

在我的情况下，当数据存储为双精度float时，或仅在byte[]表示和{{1}之间时，它是如何转换为单精度double的如果想要在大型数据集上获得最佳性能，并尽可能多地将信息嵌入到算法中而不会使其过于脆弱或难以理解，那么最好不要经历过多的API层。

因此，为了进一步跟进Richard's测试，我在下面添加了另一个测试（byte[]），这是我在自己的工作中的方式，并在他的分析部分回答了他的观点4 ：

使用不安全的内存指针访问来获得最高效的结果。 如果你使用c ++，那就很自然了，但不一定是c＃。这类似于BitConverter在幕后所做的事情，但没有参数和安全检查（当然，我们知道我们在做什么......;）

结果：

32位选项：
```
test9
```

64位选项：

test3: 00:00:06.2373138
test5: 00:00:03.1193338
test6: 00:00:03.1609287
test7: 00:00:07.7328020
test8: 00:00:06.4192130
test9: 00:00:03.9590307

这里是相同的代码，包括新的test3: 00:00:06.2209098 test5: 00:00:00.5563930 test6: 00:00:01.5486780 test7: 00:00:08.4858474 test8: 00:00:05.4991740 test9: 00:00:02.2928944：

test9

将字节数组转换为int的更快方法

5 个答案:

结果：

分析

守则