我有一个带有float类型字段的数据结构。这些结构的集合需要按浮点值进行排序。是否有基数排序实现。
如果没有,是否有快速访问指数,符号和尾数的方法。 因为如果你最后一次在尾数,指数和指数上对浮点数进行排序。你在O(n)中对浮点数进行排序。
答案 0 :(得分:18)
<强>更新强>
我对这个主题很感兴趣,所以我坐下来实现它(使用this very fast and memory conservative implementation)。我还阅读了this one(感谢celion)并发现你甚至不必将浮点数分成尾数和指数来对其进行排序。您只需要一对一地进行比特并执行int排序。你只需要关心负值,在算法结束时必须将它们反向放在正值之前(我在算法的最后一次迭代中一步完成,以节省一些cpu时间)。 / p>
所以我的浮动radixsort:
public static float[] RadixSort(this float[] array)
{
// temporary array and the array of converted floats to ints
int[] t = new int[array.Length];
int[] a = new int[array.Length];
for (int i = 0; i < array.Length; i++)
a[i] = BitConverter.ToInt32(BitConverter.GetBytes(array[i]), 0);
// set the group length to 1, 2, 4, 8 or 16
// and see which one is quicker
int groupLength = 4;
int bitLength = 32;
// counting and prefix arrays
// (dimension is 2^r, the number of possible values of a r-bit number)
int[] count = new int[1 << groupLength];
int[] pref = new int[1 << groupLength];
int groups = bitLength / groupLength;
int mask = (1 << groupLength) - 1;
int negatives = 0, positives = 0;
for (int c = 0, shift = 0; c < groups; c++, shift += groupLength)
{
// reset count array
for (int j = 0; j < count.Length; j++)
count[j] = 0;
// counting elements of the c-th group
for (int i = 0; i < a.Length; i++)
{
count[(a[i] >> shift) & mask]++;
// additionally count all negative
// values in first round
if (c == 0 && a[i] < 0)
negatives++;
}
if (c == 0) positives = a.Length - negatives;
// calculating prefixes
pref[0] = 0;
for (int i = 1; i < count.Length; i++)
pref[i] = pref[i - 1] + count[i - 1];
// from a[] to t[] elements ordered by c-th group
for (int i = 0; i < a.Length; i++){
// Get the right index to sort the number in
int index = pref[(a[i] >> shift) & mask]++;
if (c == groups - 1)
{
// We're in the last (most significant) group, if the
// number is negative, order them inversely in front
// of the array, pushing positive ones back.
if (a[i] < 0)
index = positives - (index - negatives) - 1;
else
index += negatives;
}
t[index] = a[i];
}
// a[]=t[] and start again until the last group
t.CopyTo(a, 0);
}
// Convert back the ints to the float array
float[] ret = new float[a.Length];
for (int i = 0; i < a.Length; i++)
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
return ret;
}
它稍微慢于int基数排序,因为在函数的开头和结尾复制了数组,其中浮点数按位被复制到整数和返回。然而,整个功能也是O(n)。在任何情况下都比你提出的连续排序快3倍。我不再看到很多优化空间,但如果有人这样做:随时告诉我。
要对降序进行排序,请在最后更改此行:
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
到此:
ret[a.Length - i - 1] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
<强>测量强>
我设置了一些简短的测试,包含浮动的所有特殊情况(NaN,+ / - Inf,Min / Max值,0)和随机数。它与Linq或Array.Sort
排序浮点数的排序完全相同:
NaN -> -Inf -> Min -> Negative Nums -> 0 -> Positive Nums -> Max -> +Inf
所以我用大量的10M数字进行了测试:
float[] test = new float[10000000];
Random rnd = new Random();
for (int i = 0; i < test.Length; i++)
{
byte[] buffer = new byte[4];
rnd.NextBytes(buffer);
float rndfloat = BitConverter.ToSingle(buffer, 0);
switch(i){
case 0: { test[i] = float.MaxValue; break; }
case 1: { test[i] = float.MinValue; break; }
case 2: { test[i] = float.NaN; break; }
case 3: { test[i] = float.NegativeInfinity; break; }
case 4: { test[i] = float.PositiveInfinity; break; }
case 5: { test[i] = 0f; break; }
default: { test[i] = test[i] = rndfloat; break; }
}
}
并停止了不同排序算法的时间:
Stopwatch sw = new Stopwatch();
sw.Start();
float[] sorted1 = test.RadixSort();
sw.Stop();
Console.WriteLine(string.Format("RadixSort: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
float[] sorted2 = test.OrderBy(x => x).ToArray();
sw.Stop();
Console.WriteLine(string.Format("Linq OrderBy: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
Array.Sort(test);
float[] sorted3 = test;
sw.Stop();
Console.WriteLine(string.Format("Array.Sort: {0}", sw.Elapsed));
输出是(更新:现在使用发布版本运行,而不是调试):
RadixSort: 00:00:03.9902332
Linq OrderBy: 00:00:17.4983272
Array.Sort: 00:00:03.1536785
大约是Linq的四倍多。那不错。但仍然没有那么快Array.Sort
,但也没有那么糟糕。但我真的很惊讶这个:我预计它会比非常小的阵列上的Linq慢一点。但后来我用20个元素进行了测试:
RadixSort: 00:00:00.0012944
Linq OrderBy: 00:00:00.0072271
Array.Sort: 00:00:00.0002979
甚至这次我的Radixsort比Linq快,但方式比数组排序慢。 :)
更新2:
我做了一些测量,发现了一些有趣的事情:较长的组长度常数意味着更少的迭代次数和更多的内存使用量。如果你使用16位的组长度(只有2次迭代),那么在对小数组进行排序时会产生巨大的内存开销,但如果涉及大于大约100k元素的数组,则可以击败Array.Sort
,即使不是很许多。图表轴都是对数的:
comparison chart http://daubmeier.de/philip/stackoverflow/radixsort_vs_arraysort.png
答案 1 :(得分:1)
关于如何在浮点数上执行基数排序有一个很好的解释: http://www.codercorner.com/RadixSortRevisited.htm
如果您的所有值都是正数,则可以使用二进制表示法;该链接解释了如何处理负值。
答案 2 :(得分:1)
您可以使用unsafe
块来对float *
进行memcpy或别名,以uint *
提取位。
答案 3 :(得分:1)
与Philip Daubmeiers最初将grouplength设置为8相比,通过进行一些精美的转换和交换数组而不是复制此版本,对于10M数字,其速度提高了2倍。对于该arraysize,它的速度比Array.Sort快3倍。
static public void RadixSortFloat(this float[] array, int arrayLen = -1)
{
// Some use cases have an array that is longer as the filled part which we want to sort
if (arrayLen < 0) arrayLen = array.Length;
// Cast our original array as long
Span<float> asFloat = array;
Span<int> a = MemoryMarshal.Cast<float, int>(asFloat);
// Create a temp array
Span<int> t = new Span<int>(new int[arrayLen]);
// set the group length to 1, 2, 4, 8 or 16 and see which one is quicker
int groupLength = 8;
int bitLength = 32;
// counting and prefix arrays
// (dimension is 2^r, the number of possible values of a r-bit number)
var dim = 1 << groupLength;
int groups = bitLength / groupLength;
if (groups % 2 != 0) throw new Exception("groups must be even so data is in original array at end");
var count = new int[dim];
var pref = new int[dim];
int mask = (dim) - 1;
int negatives = 0, positives = 0;
// counting elements of the 1st group incuding negative/positive
for (int i = 0; i < arrayLen; i++)
{
if (a[i] < 0) negatives++;
count[(a[i] >> 0) & mask]++;
}
positives = arrayLen - negatives;
int c;
int shift;
for (c = 0, shift = 0; c < groups - 1; c++, shift += groupLength)
{
CalcPrefixes();
var nextShift = shift + groupLength;
//
for (var i = 0; i < arrayLen; i++)
{
var ai = a[i];
// Get the right index to sort the number in
int index = pref[( ai >> shift) & mask]++;
count[( ai>> nextShift) & mask]++;
t[index] = ai;
}
// swap the arrays and start again until the last group
var temp = a;
a = t;
t = temp;
}
// Last round
CalcPrefixes();
for (var i = 0; i < arrayLen; i++)
{
var ai = a[i];
// Get the right index to sort the number in
int index = pref[( ai >> shift) & mask]++;
// We're in the last (most significant) group, if the
// number is negative, order them inversely in front
// of the array, pushing positive ones back.
if ( ai < 0) index = positives - (index - negatives) - 1; else index += negatives;
//
t[index] = ai;
}
void CalcPrefixes()
{
pref[0] = 0;
for (int i = 1; i < dim; i++)
{
pref[i] = pref[i - 1] + count[i - 1];
count[i - 1] = 0;
}
}
}
答案 4 :(得分:0)
我认为最好的选择是,如果值不太接近并且有合理的精度要求,您可以使用小数点前后的实际浮点数来进行排序。
例如,你可以使用前4个小数(不管是0还是不是)来进行排序。