最佳地计算高维空间中点之间的笛卡尔距离

时间:2014-10-23 14:51:04

标签: c# optimization

要点:

我的类SquareDistance使用具有以下名称的方法以四种方式计算笛卡尔距离的平方:

  1. 签名
  2. UnsignedBranching
  3. UnsignedDistribute
  4. CastToSignedLong
  5. 第一个是最快的并使用有符号整数,但我的数据必须是无符号的(由于下面给出的原因)。其他三种方法以无符号数字开头。我的目标是编写一个像SquareDistance那样的方法,它采用无符号数据并且比我已经编写的三个方法表现更好,尽可能接近#1的性能。代码与基准测试结果如下。 (如果您认为有帮助,则允许使用不安全的代码。)

    详细信息:

    我正在开发一种算法,使用从希尔伯特曲线导出的索引来解决K-最近邻问题。朴素的线性扫描算法的执行时间与点的数量呈时间平方,并与维度的数量呈线性关系,并且它花费所有时间来计算和比较笛卡尔距离。

    特殊希尔伯特指数背后的动机是减少调用距离函数的次数。但是,它仍然必须被调用数百万次,所以我必须尽可能快地完成它。 (这是程序中最常被调用的函数。最近失败的优化距离函数的尝试将程序执行时间从7分钟加倍到15分钟,所以不,这不是过早或多余的优化。)

    尺寸:积分可能有十到五千个维度。

    约束即可。我有两个恼人的约束:

    1. 希尔伯特变换逻辑要求将点表示为uint(无符号整数)数组。 (代码是由另一个人编写的,是魔术并且使用移位,AND,OR等等,并且无法更改。)将我的点存储为有符号整数并且不断地将它们转换为uint数组会产生可怜的性能,所以我必须至少存储每个点的uint数组副本。

    2. 为了提高效率,我制作了每个点的有符号整数副本,以加快距离计算。这非常有效,但是一旦达到大约3,000维度,我的内存就会耗尽!

    3. 为了节省内存,我删除了已记忆的有符号整数数组,并尝试编写一个无符号版本的距离计算。我的最佳结果是有符号整数版本的2.25倍。

      基准测试创建1000个随机点,每个点包含1000个维度,并在每个点和每个其他点之间执行距离计算,进行1,000,000次比较。因为我只关心相对距离,所以不通过执行平方根来节省时间。

      在调试模式下:

      SignedBenchmark                  Ratio: 1.000 Seconds: 3.739
      UnsignedBranchingBenchmark       Ratio: 2.731 Seconds: 10.212
      UnsignedDistributeBenchmark      Ratio: 3.294 Seconds: 12.320
      CastToSignedLongBenchmark        Ratio: 3.265 Seconds: 12.211
      

      在发布模式下:

       SignedBenchmark                  Ratio: 1.000 Seconds: 3.494
       UnsignedBranchingBenchmark       Ratio: 2.672 Seconds: 9.334
       UnsignedDistributeBenchmark      Ratio: 3.336 Seconds: 11.657
       CastToSignedLongBenchmark        Ratio: 3.471 Seconds: 12.127
      

      上述基准测试是在戴尔配备英特尔酷睿i7-4800MQ CPU @ 2.70GHz和16 GB内存的基础上运行的。我的大型算法已经将任务并行库用于更大的任务,因此并行化这个内循环是徒劳的。

      问题:有人能想到比UnsignedBranching更快的算法吗?

      以下是我的基准代码。

      更新

      这使用循环展开(感谢@dasblinkenlight),速度提高了2.7倍:

      public static long UnsignedLoopUnrolledBranching(uint[] x, uint[] y)
      {
          var distance = 0UL;
          var leftovers = x.Length % 4;
          var dimensions = x.Length;
          var roundDimensions = dimensions - leftovers;
      
          for (var i = 0; i < roundDimensions; i += 4)
          {
              var x1 = x[i];
              var y1 = y[i];
              var x2 = x[i+1];
              var y2 = y[i+1];
              var x3 = x[i+2];
              var y3 = y[i+2];
              var x4 = x[i+3];
              var y4 = y[i+3];
              var delta1 = x1 > y1 ? x1 - y1 : y1 - x1;
              var delta2 = x2 > y2 ? x2 - y2 : y2 - x2;
              var delta3 = x3 > y3 ? x3 - y3 : y3 - x3;
              var delta4 = x4 > y4 ? x4 - y4 : y4 - x4;
              distance += delta1 * delta1 + delta2 * delta2 + delta3 * delta3 + delta4 * delta4;
          }
          for (var i = roundDimensions; i < dimensions; i++)
          {
              var xi = x[i];
              var yi = y[i];
              var delta = xi > yi ? xi - yi : yi - xi;
              distance += delta * delta;
          }
          return (long)distance;
      }
      

      SquareDistance.cs:

      using System;
      using System.Collections.Generic;
      using System.Linq;
      using System.Text;
      using System.Threading.Tasks;
      
      namespace DistanceBenchmark
      {
          /// <summary>
          /// Provide several alternate methods for computing the square of the Cartesian distance
          /// to allow study of their relative performance.
          /// </summary>
          public static class SquareDistance
          {
              /// <summary>
              /// Compute the square of the Cartesian distance between two N-dimensional points
              /// with calculations done on signed numbers using signed arithmetic, 
              /// a single multiplication and no branching.
              /// </summary>
              /// <param name="x">First point.</param>
              /// <param name="y">Second point.</param>
              /// <returns>Square of the distance.</returns>
              public static long Signed(int[] x, int[] y)
              {
                  var distance = 0L;
                  var dimensions = x.Length;
                  for (var i = 0; i < dimensions; i++)
                  {
                      var delta = x[i] - y[i];
                      distance += delta * delta;
                  }
                  return distance;
              }
      
              /// <summary>
              /// Compute the square of the Cartesian distance between two N-dimensional points
              /// with calculations done on unsigned numbers using unsigned arithmetic, a single multiplication
              /// and a branching instruction (the ternary operator).
              /// </summary>
              /// <param name="x">First point.</param>
              /// <param name="y">Second point.</param>
              /// <returns>Square of the distance.</returns>
              public static long UnsignedBranching(uint[] x, uint[] y)
              {
                  var distance = 0UL;
                  var dimensions = x.Length;
                  for (var i = 0; i < dimensions; i++)
                  {
                      var xi = x[i];
                      var yi = y[i];
                      var delta = xi > yi ? xi - yi : yi - xi;
                      distance += delta * delta;
                  }
                  return (long)distance;
              }
      
              /// <summary>
              /// Compute the square of the Cartesian distance between two N-dimensional points
              /// with calculations done on unsigned numbers using unsigned arithmetic and the distributive law,
              /// which requires four multiplications and no branching.
              /// 
              /// To prevent overflow, the coordinates are cast to ulongs.
              /// </summary>
              /// <param name="x">First point.</param>
              /// <param name="y">Second point.</param>
              /// <returns>Square of the distance.</returns>
              public static long UnsignedDistribute(uint[] x, uint[] y)
              {
                  var distance = 0UL;
                  var dimensions = x.Length;
                  for (var i = 0; i < dimensions; i++)
                  {
                      ulong xi = x[i];
                      ulong yi = y[i];
                      distance += xi * xi + yi * yi - 2 * xi * yi;
                  }
                  return (long)distance;
              }
      
              /// <summary>
              /// Compute the square of the Cartesian distance between two N-dimensional points
              /// with calculations done on unsigned numbers using signed arithmetic, 
              /// by first casting the values into longs.
              /// </summary>
              /// <param name="x">First point.</param>
              /// <param name="y">Second point.</param>
              /// <returns>Square of the distance.</returns>
              public static long CastToSignedLong(uint[] x, uint[] y)
              {
                  var distance = 0L;
                  var dimensions = x.Length;
                  for (var i = 0; i < dimensions; i++)
                  {
                      var delta = (long)x[i] - (long)y[i];
                      distance += delta * delta;
                  }
                  return distance;
              }
      
          }
      }
      

      RandomPointFactory.cs:

      using System;
      using System.Collections.Generic;
      using System.Linq;
      using System.Text;
      using System.Threading.Tasks;
      
      namespace DistanceBenchmark
      {
          public static class RandomPointFactory
          {
              /// <summary>
              /// Get a random list of signed integer points with the given number of dimensions to use as test data.
              /// </summary>
              /// <param name="recordCount">Number of points to get.</param>
              /// <param name="dimensions">Number of dimensions per point.</param>
              /// <returns>Signed integer test data.</returns>
              public static IList<int[]> GetSignedTestPoints(int recordCount, int dimensions)
              {
                  var testData = new List<int[]>();
                  var random = new Random(DateTime.Now.Millisecond);
      
                  for (var iRecord = 0; iRecord < recordCount; iRecord++)
                  {
                      int[] point;
                      testData.Add(point = new int[dimensions]);
                      for (var d = 0; d < dimensions; d++)
                          point[d] = random.Next(100000);
                  }
                  return testData;
              }
      
              /// <summary>
              /// Get a random list of unsigned integer points with the given number of dimensions to use as test data.
              /// </summary>
              /// <param name="recordCount">Number of points to get.</param>
              /// <param name="dimensions">Number of dimensions per point.</param>
              /// <returns>Unsigned integer test data.</returns>
              public static IList<uint[]> GetUnsignedTestPoints(int recordCount, int dimensions)
              {
                  var testData = new List<uint[]>();
                  var random = new Random(DateTime.Now.Millisecond);
      
                  for (var iRecord = 0; iRecord < recordCount; iRecord++)
                  {
                      uint[] point;
                      testData.Add(point = new uint[dimensions]);
                      for (var d = 0; d < dimensions; d++)
                          point[d] = (uint)random.Next(100000);
                  }
                  return testData;
              }
          }
      }
      

      的Program.cs:

      using System;
      using System.Collections.Generic;
      using System.Diagnostics;
      using System.Linq;
      using System.Text;
      using System.Threading.Tasks;
      
      namespace DistanceBenchmark
      {
          public class Program
          {
              private static IList<int[]> SignedTestData = RandomPointFactory.GetSignedTestPoints(1000, 1000);
              private static IList<uint[]> UnsignedTestData = RandomPointFactory.GetUnsignedTestPoints(1000, 1000);
      
              static void Main(string[] args)
              {
                  var baseline = TimeIt("SignedBenchmark", SignedBenchmark);
                  TimeIt("UnsignedBranchingBenchmark", UnsignedBranchingBenchmark, baseline);
                  TimeIt("UnsignedDistributeBenchmark", UnsignedDistributeBenchmark, baseline);
                  TimeIt("CastToSignedLongBenchmark", CastToSignedLongBenchmark, baseline);
                  TimeIt("SignedBenchmark", SignedBenchmark, baseline);
                  Console.WriteLine("Done. Type any key to exit.");
                  Console.ReadLine();
              }
      
              public static void SignedBenchmark()
              {
                  foreach(var p1 in SignedTestData)
                      foreach (var p2 in SignedTestData)
                          SquareDistance.Signed(p1, p2);
              }
      
              public static void UnsignedBranchingBenchmark()
              {
                  foreach (var p1 in UnsignedTestData)
                      foreach (var p2 in UnsignedTestData)
                          SquareDistance.UnsignedBranching(p1, p2);
              }
      
              public static void UnsignedDistributeBenchmark()
              {
                  foreach (var p1 in UnsignedTestData)
                      foreach (var p2 in UnsignedTestData)
                          SquareDistance.UnsignedDistribute(p1, p2);
              }
      
              public static void CastToSignedLongBenchmark()
              {
                  foreach (var p1 in UnsignedTestData)
                      foreach (var p2 in UnsignedTestData)
                          SquareDistance.CastToSignedLong(p1, p2);
              }
      
              public static double TimeIt(String testName, Action benchmark, double baseline = 0.0)
              {
                  var stopwatch = new Stopwatch();
                  stopwatch.Start();
                  benchmark();
                  stopwatch.Stop();
                  var seconds = stopwatch.Elapsed.TotalSeconds;
                  var ratio = baseline <= 0 ? 1.0 : seconds/baseline;
                  Console.WriteLine(String.Format("{0,-32} Ratio: {1:0.000} Seconds: {2:0.000}", testName, ratio, seconds));
                  return seconds;
              }
          }
      }
      

4 个答案:

答案 0 :(得分:2)

你应该能够通过unrolling your loops

来节省大量的执行时间
public static long Signed(int[] x, int[] y)
{
    var distance = 0L;
    var dimensions = x.Length;
    var stop = dimensions - (dimensions % 4);
    for (var i = 0; i < stop; i+=4)
    {
        var delta0 = x[i] - y[i];
        var delta1 = x[i+1] - y[i+1];
        var delta2 = x[i+2] - y[i+2];
        var delta3 = x[i+3] - y[i+3];
        distance += (delta0 * delta0)
                  + (delta1 * delta1)
                  + (delta2 * delta2)
                  + (delta3 * delta3);
    }
    for (var i = stop; i < dimensions; i++)
    {
        var delta = x[i] - y[i];
        distance += delta * delta;
    }
    return distance;
}

仅此更改就将本地系统的执行时间从8.325秒减少到4.745秒 - 提高了43%!

这个想法是尽可能一次做四个点,然后在一个单独的循环中完成其余的点。

答案 1 :(得分:1)

如果你不能改变希尔伯特曲线,你可以尝试一条z曲线,即一个莫顿曲线。将维度转换为二进制并将其交错。然后排序。您可以使用最大符号位验证上限。 n维中的希尔伯特曲线使用格雷码,也许您可​​以在互联网上搜索更快的版本。您可以在黑客食谱中找到一些快速实现。莫顿曲线应该类似于h树。当您需要精度时,您可以尝试希尔伯特曲线的副本,即摩尔曲线。例如,在2d中,您可以交错4个希尔伯特曲线:

enter image description here” ,

答案 2 :(得分:0)

我能看到的最好的改进不会是一个低悬的果实。这种问题不适合当前版本的.net框架(或一般的CPU)。

您遇到的问题类别称为SIMD。您可能听说过Intel Pentium MMX。 MMX指令集是SIMD指令集的营销术语。

有三种很好的方法可以使SIMD与您的程序一起运行。按照从最慢到最快的顺序,以及最简单到最难的顺序。

  1. RyuJIT(下一个.net编译器的预览版)以利用CPU SIMD
  2. P/Invoke进入C++ AMP到您的GPU上
  3. 使用专为此计算而设计的内核插入到FPGA上。
  4. 我强烈建议您尝试利用带有C ++ AMP的GPU,特别是因为uint[]应该很容易传递给C++ AMP

答案 3 :(得分:0)

在今天早上的淋浴中,我想出了一种方法,使用点积进一步改善这一点,当数据存储为uint []数组时,再削减百分之五十。之前我曾调查过这个想法,但未能识别出我可以通过预计算优化的循环不变量。该想法的基础是分配操作:

    (x-y)(x-y) = x*x + y*y - 2xy

如果我对所有坐标求和,结果是:

        2         2       2
       D    =  |x|  +  |y|  -  2(x·y)

由于我将执行大量的距离计算,我可以存储每个向量的平方长度。然后找到两个向量之间的距离相当于它们的平方距离(在循环外)和计算点积的计算,它没有负值,因此不需要分支!

为什么分支出问题?这是因为使用uint向量,您无法使用分支操作减去笛卡尔公式中的值来测试哪个值更大。因此,如果我想要(x-y)*(x-y),我需要在循环中执行此操作:

var delta = x[i] > y[i] ? x[i] - y[i] : y[i] - x[i];
distance += delta * delta;

另外,为了防止从uint到ulong的溢出,我需要将数字强制转换为ulong,这真的杀死了性能。由于我的大多数坐标都很小,我能够创建一个测试。我还存储每个向量的最大值。由于我一次通过四次迭代展开我的循环,如果4 * xMax * yMax没有溢出uint,我可以免除我的大部分投射操作。如果测试失败,我会做更昂贵的版本,投射更多。

最后,我有几个实现:带有转换的天真,带有分支,分配了铸件和不移除的循环不变量,以及更少铸造和不变量移除的点积。

朴素方法在每次循环迭代中都有减法,乘法和加法。删除了循环不变量的点积分布仅使用乘法和加法。

以下是基准:

For 100000 iterations and 2000 dimensions. 
    Naive time        = 2.505 sec. 
    Branch time       = 0.628 sec. 
    Distributed time  = 6.371 sec.
    Dot Product time  = 0.288 sec.
    Improve vs Naive  = 88.5%.
    Improve vs Branch = 54.14%.

这是作为NUnit测试的代码:

using System;
using System.Diagnostics;
using NUnit.Framework;
using System.Linq;

namespace HilbertTransformationTests
{
    [TestFixture]
    public class CartesianDistanceTests
    {

        [Test]
        public void SquareDistanceBenchmark()
        {
            var dims = 2000;
            var x = new uint[dims];
            var y = new uint[dims];
            var xMag2 = 0L;
            var yMag2 = 0L;

            for (var i = 0; i < dims; i++)
            {
                x[i] = (uint)i;
                xMag2 += x[i] * (long)x[i];
                y[i] = (uint)(10000 - i);
                yMag2 += y[i] * (long)y[i];
            }
            var xMax = (long)x.Max();
            var yMax = (long)y.Max();
            var repetitions = 100000;
            var naiveTime = Time(() => SquareDistanceNaive(x, y), repetitions);
            var distributeTime = Time(() => SquareDistanceDistributed(x, y), repetitions);
var branchTime = Time(() => SquareDistanceBranching(x, y), repetitions);
            var dotProductTime = Time(() => SquareDistanceDotProduct(x, y, xMag2, yMag2, xMax, yMax), repetitions);

            Console.Write($@"
For {repetitions} iterations and {dims} dimensions. 
    Naive time        = {naiveTime} sec. 
    Branch time       = {branchTime} sec. 
    Distributed time  = {distributeTime} sec.
    Dot Product time  = {dotProductTime} sec.
    Improve vs Naive  = {((int)(10000 * (naiveTime - dotProductTime) / naiveTime)) / 100.0}%.
    Improve vs Branch = {((int)(10000 * (branchTime - dotProductTime) / branchTime)) / 100.0}%.
");
            Assert.Less(dotProductTime, branchTime, "Dot product time should have been less than branch time");
        }

        private static double Time(Action action, int repeatCount)
        {
            var timer = new Stopwatch();
            timer.Start();
            for (var j = 0; j < repeatCount; j++)
                action();
            timer.Stop();
            return timer.ElapsedMilliseconds / 1000.0;
        }

        private static long SquareDistanceNaive(uint[] x, uint[] y)
        {
            var squareDistance = 0L;
            for (var i = 0; i < x.Length; i++)
            {
                var delta = (long)x[i] - (long)y[i];
                squareDistance += delta * delta;
            }
            return squareDistance;
        }

        /// <summary>
        /// Compute the square distance, using ternary operators for branching to keep subtraction operations from going negative,
        /// which is inappropriate for unsigned numbers.
        /// </summary>
        /// <returns>The distance branching.</returns>
        /// <param name="x">The x coordinate.</param>
        /// <param name="y">The y coordinate.</param>
        private static long SquareDistanceBranching(uint[] x, uint[] y)
        {
            long squareDistanceLoopUnrolled;

            // Unroll the loop partially to improve speed. (2.7x improvement!)
            var distance = 0UL;
            var leftovers = x.Length % 4;
            var dimensions = x.Length;
            var roundDimensions = dimensions - leftovers;

            for (var i = 0; i < roundDimensions; i += 4)
            {
                var x1 = x[i];
                var y1 = y[i];
                var x2 = x[i + 1];
                var y2 = y[i + 1];
                var x3 = x[i + 2];
                var y3 = y[i + 2];
                var x4 = x[i + 3];
                var y4 = y[i + 3];
                var delta1 = x1 > y1 ? x1 - y1 : y1 - x1;
                var delta2 = x2 > y2 ? x2 - y2 : y2 - x2;
                var delta3 = x3 > y3 ? x3 - y3 : y3 - x3;
                var delta4 = x4 > y4 ? x4 - y4 : y4 - x4;
                distance += delta1 * delta1 + delta2 * delta2 + delta3 * delta3 + delta4 * delta4;
            }
            for (var i = roundDimensions; i < dimensions; i++)
            {
                var xi = x[i];
                var yi = y[i];
                var delta = xi > yi ? xi - yi : yi - xi;
                distance += delta * delta;
            }
            squareDistanceLoopUnrolled = (long)distance;

            return squareDistanceLoopUnrolled;
        }

        private static long SquareDistanceDistributed(uint[] x, uint[] y)
        {
            long squareDistanceLoopUnrolled;

            // Unroll the loop partially to improve speed. (2.7x improvement!)
            var distance = 0UL;
            var dSubtract = 0UL;
            var leftovers = x.Length % 4;
            var dimensions = x.Length;
            var roundDimensions = dimensions - leftovers;

            for (var i = 0; i < roundDimensions; i += 4)
            {
                ulong x1 = x[i];
                ulong y1 = y[i];
                ulong x2 = x[i + 1];
                ulong y2 = y[i + 1];
                ulong x3 = x[i + 2];
                ulong y3 = y[i + 2];
                ulong x4 = x[i + 3];
                ulong y4 = y[i + 3];

                distance += x1 * x1 + y1 * y1 
                          + x2 * x2 + y2 * y2 
                          + x3 * x3 + y3 * y3 
                          + x4 * x4 + y4 * y4;
                dSubtract += x1 * y1 + x2 * y2 + x3 * y3 + x4 * y4;
            }
            distance = distance - 2UL * dSubtract;
            for (var i = roundDimensions; i < dimensions; i++)
            {
                var xi = x[i];
                var yi = y[i];
                var delta = xi > yi ? xi - yi : yi - xi;
                distance += delta * delta;
            }
            squareDistanceLoopUnrolled = (long)distance;

            return squareDistanceLoopUnrolled;
        }

        private static long SquareDistanceDotProduct(uint[] x, uint[] y, long xMag2, long yMag2, long xMax, long yMax)
        {
            const int unroll = 4;
            if (xMax * yMax * unroll < (long) uint.MaxValue)
                return SquareDistanceDotProductNoOverflow(x, y, xMag2, yMag2);

            // Unroll the loop partially to improve speed. (2.7x improvement!)
            var dotProduct = 0UL;
            var leftovers = x.Length % unroll;
            var dimensions = x.Length;
            var roundDimensions = dimensions - leftovers;

            for (var i = 0; i < roundDimensions; i += unroll)
            {
                var x1 = x[i];
                ulong y1 = y[i];
                var x2 = x[i + 1];
                ulong y2 = y[i + 1];
                var x3 = x[i + 2];
                ulong y3 = y[i + 2];
                var x4 = x[i + 3];
                ulong y4 = y[i + 3];
                dotProduct += x1 * y1 + x2 * y2 + x3 * y3 + x4 * y4;
            }
            for (var i = roundDimensions; i < dimensions; i++)
                dotProduct += x[i] * (ulong)y[i];
            return xMag2 + yMag2 - 2L * (long)dotProduct;
        }

        /// <summary>
        /// Compute the square of the Cartesian distance using the dotproduct method,
        /// assuming that calculations wont overflow uint.
        /// 
        /// This permits us to skip some widening conversions to ulong, making the computation faster.
        /// 
        /// Algorithm:
        /// 
        ///    2         2       2
        ///   D    =  |x|  +  |y|  -  2(x·y)
        /// 
        /// Using the dot product of x and y and precomputed values for the square magnitudes of x and y
        /// permits us to use two operations (multiply and add) instead of three (subtract, multiply and add)
        /// in the main loop, saving one third of the time.
        /// </summary>
        /// <returns>The square distance.</returns>
        /// <param name="x">First point.</param>
        /// <param name="y">Second point.</param>
        /// <param name="xMag2">Distance from x to the origin, squared.</param>
        /// <param name="yMag2">Distance from y to the origin, squared.</param>
        private static long SquareDistanceDotProductNoOverflow(uint[] x, uint[] y, long xMag2, long yMag2)
        {
            // Unroll the loop partially to improve speed. (2.7x improvement!)
            const int unroll = 4;
            var dotProduct = 0UL;
            var leftovers = x.Length % unroll;
            var dimensions = x.Length;
            var roundDimensions = dimensions - leftovers;
            for (var i = 0; i < roundDimensions; i += unroll)
                dotProduct += (x[i] * y[i] + x[i+1] * y[i+1] + x[i+2] * y[i+2] + x[i+3] * y[i+3]);
            for (var i = roundDimensions; i < dimensions; i++)
                dotProduct += x[i] * y[i];
            return xMag2 + yMag2 - 2L * (long)dotProduct;
        }


    }

}