Question

我需要从双精度数组中找到n个最低值（不是0）（让我们调用数组 samples ）。我需要在循环中多次执行此操作，因此执行速度至关重要。我尝试首先对数组进行排序，然后获取前10个值（不是0），但是，虽然说Array.Sort很快，但它成了瓶颈：

const int numLowestSamples = 10;

double[] samples;

double[] lowestSamples = new double[numLowestSamples];

for (int count = 0; count < iterations; count++) // iterations typically around 2600000
{
    samples = whatever;
    Array.Sort(samples);
    lowestSamples = samples.SkipWhile(x => x == 0).Take(numLowestSamples).ToArray();
}

因此，我尝试了一个不同但不太干净的解决方案，首先读取前n个值，对它们进行排序，然后循环遍历 samples 中的所有其他值，检查该值是否小于最后一个值已排序的 lowestSamples 数组中的值。如果该值较低，则将其替换为数组中的值，然后再次对数组进行排序。结果大约快了5倍：

const int numLowestSamples = 10;

double[] samples;

List<double> lowestSamples = new List<double>();

for (int count = 0; count < iterations; count++) // iterations typically around 2600000
{
    samples = whatever;

    lowestSamples.Clear();

    // Read first n values
    int i = 0;
    do
    {
        if (samples[i] > 0)
            lowestSamples.Add(samples[i]);

        i++;
    } while (lowestSamples.Count < numLowestSamples)

    // Sort the array
    lowestSamples.Sort();

    for (int j = numLowestSamples; j < samples.Count; j++) // samples.Count is typically 3600
    {
        // if value is larger than 0, but lower than last/highest value in lowestSamples
        // write value to array (replacing the last/highest value), then sort array so
        // last value in array still is the highest
        if (samples[j] > 0 && samples[j] < lowestSamples[numLowestSamples - 1])
        {
            lowestSamples[numLowestSamples - 1] = samples[j];
            lowestSamples.Sort();
        }
    }
}

虽然这种方法相对较快，但我想挑战任何人提出更快更好的解决方案。

Answer 1

这称为选择算法。

此Wiki页面上有一些通用解决方案：

http://en.wikipedia.org/wiki/Selection_algorithm#Selecting_k_smallest_or_largest_elements

（但你必须做一些工作才能转换成c＃）

您可以使用QuickSelect算法查找第n个最低元素，然后遍历数组以获取每个元素＆lt; =那个元素。

在c＃中有一个示例QuickSelect：http://dpatrickcaldwell.blogspot.co.uk/2009/03/more-ilist-extension-methods.html

Answer 2

我认为您可能想尝试维护最小堆并测量性能差异。这是一个我一直在研究的称为Fibonacci堆的数据结构。它可能会使用一些工作但你至少可以检验我的假设。

public sealed class FibonacciHeap<TKey, TValue>
{
    readonly List<Node> _root = new List<Node>();
    int _count;
    Node _min;

    public void Push(TKey key, TValue value)
    {
        Insert(new Node {
            Key = key,
            Value = value
        });
    }       

    public KeyValuePair<TKey, TValue> Peek()
    {
        if (_min == null)
            throw new InvalidOperationException();
        return new KeyValuePair<TKey,TValue>(_min.Key, _min.Value);
    }       

    public KeyValuePair<TKey, TValue> Pop()
    {
        if (_min == null)
            throw new InvalidOperationException();
        var min = ExtractMin();
        return new KeyValuePair<TKey,TValue>(min.Key, min.Value);
    }

    void Insert(Node node)
    {
        _count++;
        _root.Add(node);
        if (_min == null)
        {
            _min = node;
        }
        else if (Comparer<TKey>.Default.Compare(node.Key, _min.Key) < 0)
        {
            _min = node;
        }
    }

    Node ExtractMin()
    {
        var result = _min;
        if (result == null)
            return null;
        foreach (var child in result.Children)
        {
            child.Parent = null;
            _root.Add(child);
        }
        _root.Remove(result);
        if (_root.Count == 0)
        {
            _min = null;
        }
        else
        {
            _min = _root[0];
            Consolidate();
        }
        _count--;
        return result;
    }

    void Consolidate()
    {
        var a = new Node[UpperBound()];
        for (int i = 0; i < _root.Count; i++)
        {
            var x = _root[i];
            var d = x.Children.Count;
            while (true)
            {   
                var y = a[d];
                if (y == null)
                    break;                  
                if (Comparer<TKey>.Default.Compare(x.Key, y.Key) > 0)
                {
                    var t = x;
                    x = y;
                    y = t;
                }
                _root.Remove(y);
                i--;
                x.AddChild(y);
                y.Mark = false;
                a[d] = null;
                d++;
            }
            a[d] = x;
        }
        _min = null;
        for (int i = 0; i < a.Length; i++)
        {
            var n = a[i];
            if (n == null)
                continue;
            if (_min == null)
            {
                _root.Clear();
                _min = n;
            }
            else
            {
                if (Comparer<TKey>.Default.Compare(n.Key, _min.Key) < 0)
                {
                    _min = n;
                }
            }
            _root.Add(n);
        }
    }

    int UpperBound()
    {
        return (int)Math.Floor(Math.Log(_count, (1.0 + Math.Sqrt(5)) / 2.0)) + 1;
    }

    class Node
    {
        public TKey Key;
        public TValue Value;
        public Node Parent;
        public List<Node> Children = new List<Node>();
        public bool Mark;

        public void AddChild(Node child)
        {
            child.Parent = this;
            Children.Add(child);
        }

        public override string ToString()
        {
            return string.Format("({0},{1})", Key, Value);
        }
    }
}

Answer 3

理想情况下，您只想对集合进行一次传递，因此您的解决方案非常灵活。但是，当您只需要在其前面提升数字时，您就会在每次插入时使用整个子列表。然而，排序10个元素几乎可以忽略不计，增强这个并不会给你太多。对于您的解决方案，最糟糕的情况（就浪费的性能而言）是如果您从头开始有9个最低的数字，那么对于每个后续数字，您会发现＆lt; lowestSamples[numLowestSamples - 1]，您将对已排序的列表进行排序（这是QuickSort最糟糕的情况）。

最重要的是，由于您使用的数字很少，考虑到使用托管语言执行此操作的开销，您无法进行大量的数学改进。

关于酷算法的赞誉！

Answer 4

不要重复排序lowerSamples，只需将样本插入它所在的位置：

int samplesCount = samples.Count;

for (int j = numLowestSamples; j < samplesCount; j++)
{
    double sample = samples[j];

    if (sample > 0 && sample < currentMax)
    {
        int k;

        for (k = 0; k < numLowestSamples; k++)
        {
           if (sample < lowestSamples[k])
           {
              Array.Copy(lowestSamples, k, lowestSamples, k + 1, numLowestSamples - k - 1);
              lowestSamples[k] = sample;

              break;
           }
        }

        if (k == numLowestSamples)
        {
           lowestSamples[numLowestSamples - 1] = sample;
        }

        currentMax = lowestSamples[numLowestSamples - 1];
    }
}

现在，如果numLowestSamples需要非常大（接近samples.count的大小），那么您可能希望使用可能更快的优先级队列（通常是O（logn）用于插入新样本而不是O （n / 2）其中n是numLowestSamples）。优先级队列将能够有效地插入新值并在O（logn）时间内敲掉最大值。

如果numLowestSamples为10，则根本不需要它 - 特别是因为你只处理双打而不是复杂的数据结构。使用堆和较小的numLowestSamples，为堆节点分配内存的开销（大多数优先级队列使用堆）可能比任何搜索/插入效率增益更大（测试很重要）。

Answer 5

两个不同的想法：

不要对数组进行排序，只需对其执行一次Insertion Sort传递即可。您已经知道新添加的项目将是唯一一个无序的项目，因此请使用该知识。
看看Heap Sort。它构建一个二进制最大堆（如果要将最小值排序为最大值），然后通过将索引为0的max元素与仍然属于堆的最后一个元素交换，从堆中开始删除元素。现在，如果您假装从最大元素到最小元素对数组进行排序，则可以在对10个元素进行排序后停止排序。数组末尾的10个元素将是最小的，剩下的数组仍然是数组表示中的二进制堆。我不确定这与Quicksort-based selection algorithm on Wikipedia相比如何。无论要选择多少个元素，都将始终为整个数组构建堆。

Answer 6

我认为你的想法是正确的。即，一次通过并保持最小尺寸的排序数据结构通常是最快的。您对此的性能改进是优化。

您的优化将是： 1）您每次通过时都会对结果进行排序。这对于小尺寸来说可能是最快的，对于较大的尺寸来说它不是最快的。考虑可能有两种算法，一种用于低于给定阈值，一种用于高于阈值的算法（如堆排序）。 2）跟踪必须从最小集合中删除的任何值（当前通过查看最后一个元素来执行此操作）。您可以跳过插入和排序任何大于或等于任何被踢出的值的值。

从阵列中获得n最低的最快方法

6 个答案: