为什么.NET中的多维数组比普通数组慢?

时间:2009-01-22 11:50:26

标签: .net performance arrays

编辑:我为大家道歉。我实际上想说“多维数组”时使用了“锯齿状数组”这个术语(如下面的例子所示)。我为使用错误的名字道歉。我实际上发现锯齿状阵列比多维阵列更快!我已经为锯齿状阵列添加了测量值。

我今天尝试使用锯齿状的多维数组,当时我注意到它的性能并不像我预期的那样。使用单维数组和手动计算索引要比使用2D数组快得多(几乎两倍)。我使用1024*1024数组(初始化为随机值)编写了一个测试,进行了1000次迭代,我在我的机器上得到了以下结果:

sum(double[], int): 2738 ms (100%)
sum(double[,]):     5019 ms (183%)
sum(double[][]):    2540 ms ( 93%)

这是我的测试代码:

public static double sum(double[] d, int l1) {
    // assuming the array is rectangular
    double sum = 0;
    int l2 = d.Length / l1;
    for (int i = 0; i < l1; ++i)
        for (int j = 0; j < l2; ++j)
            sum += d[i * l2 + j];
    return sum;
}

public static double sum(double[,] d) {
    double sum = 0;
    int l1 = d.GetLength(0);
    int l2 = d.GetLength(1);
    for (int i = 0; i < l1; ++i)
        for (int j = 0; j < l2; ++j)
            sum += d[i, j];
    return sum;
}

public static double sum(double[][] d) {
    double sum = 0;
    for (int i = 0; i < d.Length; ++i)
        for (int j = 0; j < d[i].Length; ++j)
            sum += d[i][j];
    return sum;
}

public static void Main() {
    Random random = new Random();
    const int l1  = 1024, l2 = 1024;
    double[ ] d1  = new double[l1 * l2];
    double[,] d2  = new double[l1 , l2];
    double[][] d3 = new double[l1][];

    for (int i = 0; i < l1; ++i) {
        d3[i] = new double[l2];
        for (int j = 0; j < l2; ++j)
            d3[i][j] = d2[i, j] = d1[i * l2 + j] = random.NextDouble();
    }
    //
    const int iterations = 1000;
    TestTime(sum, d1, l1, iterations);
    TestTime(sum, d2, iterations);
    TestTime(sum, d3, iterations);
}

进一步研究表明,第二种方法的IL比第一种方法大23%。 (代码大小68比52)这主要是由于呼叫System.Array::GetLength(int)。编译器还为锯齿状的多维数组发出Array::Get的调用,而它只是为简单数组调用ldelem

所以我想知道,为什么通过多维数组访问比普通数组更慢?我会假设编译器(或JIT)会做类似于我在第一种方法中所做的事情,但事实并非如此。

你能不能帮助我理解为什么会发生这种情况?


更新:根据Henk Holterman的建议,以下是TestTime的实施:

public static void TestTime<T, TR>(Func<T, TR> action, T obj,
                                   int iterations)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < iterations; ++i)
        action(obj);
    Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
}

public static void TestTime<T1, T2, TR>(Func<T1, T2, TR> action, T1 obj1,
                                        T2 obj2, int iterations)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < iterations; ++i)
        action(obj1, obj2);
    Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
}

10 个答案:

答案 0 :(得分:42)

下限为0的单维数组与IL内的多维或非0下限数组的类型不同(vector vs array IIRC)。使用vector更简单 - 要获取元素x,您只需执行pointer + size * x。对于array,您必须为单维数组执行pointer + size * (x-lower bound),并为您添加的每个维度执行更多算术运算。

基本上,CLR针对更常见的情况进行了优化。

答案 1 :(得分:9)

数组边界检查?

单维数组有一个可以直接访问的长度成员 - 编译时这只是一个内存读取。

多维数组需要GetLength(int dimension)方法调用,该方法调用处理参数以获取该维度的相关长度。这不会编译为内存读取,因此您可以进行方法调用等。

此外,GetLength(int dimension)将对参数进行边界检查。

答案 2 :(得分:4)

有趣的是,我从上面运行了以下代码 在Vista盒子上使用VS2008 NET3.5SP1 Win32, 在释放/优化差异几乎是不可测量的, 调试/ noopt多dim数组要慢得多。 (我运行了三次测试以减少第二组的JIT影响。)

  Here are my numbers: 
    sum took 00:00:04.3356535
    sum took 00:00:04.1957663
    sum took 00:00:04.5523050
    sum took 00:00:04.0183060
    sum took 00:00:04.1785843 
    sum took 00:00:04.4933085

查看第二组三个数字。 差别不足以让我在单维数组中编码所有内容。

虽然我没有发布它们,但在Debug / unoptimized中的多维度与 单/锯齿确实产生了巨大的差异。

完整计划:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;

namespace single_dimension_vs_multidimension
{
    class Program
    {


        public static double sum(double[] d, int l1) {    // assuming the array is rectangular 
            double sum = 0; 
            int l2 = d.Length / l1; 
            for (int i = 0; i < l1; ++i)   
                for (int j = 0; j < l2; ++j)   
                    sum += d[i * l2 + j];   
            return sum;
        }

        public static double sum(double[,] d)
        {
            double sum = 0;  
            int l1 = d.GetLength(0);
            int l2 = d.GetLength(1);   
            for (int i = 0; i < l1; ++i)    
                for (int j = 0; j < l2; ++j)   
                    sum += d[i, j]; 
            return sum;
        }
        public static double sum(double[][] d)
        {
            double sum = 0;   
            for (int i = 0; i < d.Length; ++i) 
                for (int j = 0; j < d[i].Length; ++j) 
                    sum += d[i][j];
            return sum;
        }
        public static void TestTime<T, TR>(Func<T, TR> action, T obj, int iterations) 
        { 
            Stopwatch stopwatch = Stopwatch.StartNew();
            for (int i = 0; i < iterations; ++i)      
                action(obj);
            Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
        }
        public static void TestTime<T1, T2, TR>(Func<T1, T2, TR> action, T1 obj1, T2 obj2, int iterations)
        {
            Stopwatch stopwatch = Stopwatch.StartNew(); 
            for (int i = 0; i < iterations; ++i)    
                action(obj1, obj2); 
            Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
        }
        public static void Main() {   
            Random random = new Random(); 
            const int l1  = 1024, l2 = 1024; 
            double[ ] d1  = new double[l1 * l2]; 
            double[,] d2  = new double[l1 , l2];  
            double[][] d3 = new double[l1][];   
            for (int i = 0; i < l1; ++i)
            {
                d3[i] = new double[l2];   
                for (int j = 0; j < l2; ++j)  
                    d3[i][j] = d2[i, j] = d1[i * l2 + j] = random.NextDouble();
            }    
            const int iterations = 1000;
            TestTime<double[], int, double>(sum, d1, l1, iterations);
            TestTime<double[,], double>(sum, d2, iterations);

            TestTime<double[][], double>(sum, d3, iterations);
            TestTime<double[], int, double>(sum, d1, l1, iterations);
            TestTime<double[,], double>(sum, d2, iterations);
            TestTime<double[][], double>(sum, d3, iterations); 
        }

    }
}

答案 3 :(得分:3)

因为多维数组只是一个语法糖,因为它实际上只是一个具有一些索引计算魔力的平面数组。另一方面,锯齿状数组就像是一个数组数组。使用二维数组,访问元素只需要读取一次内存,而使用两级锯齿状数组,则需要读取内存两次。

编辑:显然原版海报将“锯齿状阵列”与“多维阵列”混为一谈,所以我的推理并不完全正确。出于真正的原因,请查看Jon Skeet上面的重型炮兵答案。

答案 4 :(得分:2)

Jagged数组是类引用的数组(其他数组)直到叶子数组,可能是基本类型的数组。因此,为每个其他阵列分配的内存可以到处都是。

而mutli-dimensional数组的内存分配在一个连续的块中。

答案 5 :(得分:2)

最快的速度取决于您的阵列大小。

易于阅读的图像:

enter image description here

控制台结果:

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.997 (1909/November2018Update/19H2)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.302
  [Host]        : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT
  .NET Core 3.1 : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT

Job=.NET Core 3.1  Runtime=.NET Core 3.1

|           Method |    D |            Mean |         Error |        StdDev |      Gen 0 |     Gen 1 |     Gen 2 |  Allocated |
|----------------- |----- |----------------:|--------------:|--------------:|-----------:|----------:|----------:|-----------:|
| 'double[D1][D2]' |   10 |        376.2 ns |       7.57 ns |      12.00 ns |     0.3643 |         - |         - |     1144 B |
| 'double[D1, D2]' |   10 |        325.5 ns |       3.71 ns |       3.47 ns |     0.2675 |         - |         - |      840 B |
| 'double[D1][D2]' |   50 |      4,821.4 ns |      44.71 ns |      37.34 ns |     6.8893 |         - |         - |    21624 B |
| 'double[D1, D2]' |   50 |      5,834.1 ns |      64.35 ns |      60.20 ns |     6.3629 |         - |         - |    20040 B |
| 'double[D1][D2]' |  100 |     19,124.4 ns |     230.39 ns |     454.77 ns |    26.2756 |    0.7019 |         - |    83224 B |
| 'double[D1, D2]' |  100 |     23,561.4 ns |     299.18 ns |     279.85 ns |    24.9939 |         - |         - |    80040 B |
| 'double[D1][D2]' |  500 |  1,248,458.7 ns |  11,241.19 ns |  10,515.01 ns |   322.2656 |  160.1563 |         - |  2016025 B |
| 'double[D1, D2]' |  500 |    966,940.8 ns |   5,694.46 ns |   5,326.60 ns |   303.7109 |  303.7109 |  303.7109 |  2000034 B |
| 'double[D1][D2]' | 1000 |  8,987,202.8 ns |  97,133.16 ns |  90,858.41 ns |  1421.8750 |  578.1250 |  265.6250 |  8032582 B |
| 'double[D1, D2]' | 1000 |  3,628,421.3 ns |  72,240.02 ns | 177,206.01 ns |   179.6875 |  179.6875 |  179.6875 |  8000036 B |
| 'double[D1][D2]' | 1500 | 26,496,994.4 ns | 380,625.25 ns | 356,037.09 ns |  3406.2500 | 1500.0000 |  531.2500 | 18048064 B |
| 'double[D1, D2]' | 1500 | 12,417,733.7 ns | 243,802.76 ns | 260,866.22 ns |   156.2500 |  156.2500 |  156.2500 | 18000038 B |
| 'double[D1][D2]' | 3000 | 86,943,097.4 ns | 485,339.32 ns | 405,280.31 ns | 12833.3333 | 7000.0000 | 1333.3333 | 72096325 B |
| 'double[D1, D2]' | 3000 | 57,969,405.9 ns | 393,463.61 ns | 368,046.11 ns |   222.2222 |  222.2222 |  222.2222 | 72000100 B |

// * Hints *
Outliers
  MultidimensionalArrayBenchmark.'double[D1][D2]': .NET Core 3.1 -> 1 outlier  was  removed (449.71 ns)
  MultidimensionalArrayBenchmark.'double[D1][D2]': .NET Core 3.1 -> 2 outliers were removed, 3 outliers were detected (4.75 us, 5.10 us, 5.28 us)
  MultidimensionalArrayBenchmark.'double[D1][D2]': .NET Core 3.1 -> 13 outliers were removed (21.27 us..30.62 us)
  MultidimensionalArrayBenchmark.'double[D1, D2]': .NET Core 3.1 -> 1 outlier  was  removed (4.19 ms)
  MultidimensionalArrayBenchmark.'double[D1, D2]': .NET Core 3.1 -> 3 outliers were removed, 4 outliers were detected (11.41 ms, 12.94 ms..13.61 ms)
  MultidimensionalArrayBenchmark.'double[D1][D2]': .NET Core 3.1 -> 2 outliers were removed (88.68 ms, 89.27 ms)

// * Legends *
  D         : Value of the 'D' parameter
  Mean      : Arithmetic mean of all measurements
  Error     : Half of 99.9% confidence interval
  StdDev    : Standard deviation of all measurements
  Gen 0     : GC Generation 0 collects per 1000 operations
  Gen 1     : GC Generation 1 collects per 1000 operations
  Gen 2     : GC Generation 2 collects per 1000 operations
  Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  1 ns      : 1 Nanosecond (0.000000001 sec)

基准代码:

[SimpleJob(BenchmarkDotNet.Jobs.RuntimeMoniker.NetCoreApp31)]
[MemoryDiagnoser]
public class MultidimensionalArrayBenchmark {
    [Params(10, 50, 100, 500, 1000, 1500, 3000)]
    public int D { get; set; }

    [Benchmark(Description = "double[D1][D2]")]
    public double[][] JaggedArray() {
        var array = new double[D][];
        for (int i = 0; i < array.Length; i++) {
            var subArray = new double[D];
            array[i] = subArray;

            for (int j = 0; j < subArray.Length; j++) {
                subArray[j] = j + i * 10;
            }
        }

        return array;
    }

    [Benchmark(Description = "double[D1, D2]")]
    public double[,] MultidimensionalArray() {
        var array = new double[D, D];
        for (int i = 0; i < D; i++) {
            for (int j = 0; j < D; j++) {
                array[i, j] = j + i * 10;
            }
        }

        return array;
    }
}

答案 6 :(得分:1)

我认为它有一些事情要做,因为锯齿状数组实际上是数组数组,因此有两个级别的间接来获取实际数据。

答案 7 :(得分:1)

我和其他所有人在一起

我有一个带有三维数组的程序,让我告诉你,当我将数组移动到二维时,我看到一个巨大的提升,然后我转移到一维数组。

最后,我认为我在执行时间内看到了超过500%的性能提升。

唯一的缺点是增加了复杂性,以找出一维数组中的内容,而不是三维数组。

答案 8 :(得分:1)

我认为多维度较慢,运行时必须检查两个或更多(三维和向上)边界检查。

答案 9 :(得分:-1)

检查边界。如果“i”小于l1,则“j”变量可能超过l2。这在第二个例子中是不合法的