Question

我有一个可行的加权选择算法，但我想在两个方面（按重要性顺序）改进它：

保证选择每个可能选择的最小数量。
将计算复杂度从 O（nm）降低到 O（n）或 O（m），其中 n 是随机选择的项目的请求数量， m 是可用项目的类型。

修改：出于我的目的，请求的号码数量通常较小（小于100）。因此，具有复杂度 O（t）或 O（t + n）的算法，其中 t 是项目的总数，通常执行由于 O（t）而导致的 O（nm）更差。 O（M）

简化代码：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Security.Cryptography;

public class Program
{
    static void Main(string[] args)
    {
        // List of items with discrete availability
        // In this example there is a total of 244 discrete items and 3 types, 
        // but there could be millions of items and and hundreds of types. 
        List<Stock<string>> list = new List<Stock<string>>();
        list.Add(new Stock<string>("Apple", 200));
        list.Add(new Stock<string>("Coconut", 2));
        list.Add(new Stock<string>("Banana", 42));

        // Pick 10 random items
        // Chosen with equal weight across all types of items
        foreach (var item in Picker<string>.PickRandom(10, list))
        {
            // Do stuff with item
            Console.WriteLine(item);
        }
    }
}

// Can be thought of as a weighted choice
// where (Item Available) / (Sum of all Available) is the weight.
public class Stock<T>
{
    public Stock(T item, int available)
    {
        Item = item;
        Available = available;
    }
    public T Item { get; set; }
    public int Available { get; set; }
}

public static class Picker<T>
{
    // Randomly choose requested number of items from across all stock types
    // Currently O(nm), where n is requested number of items and m is types of stock
    // O(n) or O(m) would be nice, which I believe is possible but not sure how
    // O(1) would be awesome, but I don't believe it is possible
    public static IEnumerable<T> PickRandom(int requested, IEnumerable<Stock<T>> list)
    {
        // O(m) : to calcuate total items,
        // thus implicitly have per item weight -> (Item Available) / (Total Items)
        int sumAll = list.Sum(x => x.Available);

        // O(1)
        if (sumAll < requested)
            throw new ArgumentException("Requested amount must not exceed total available");

        // O(1)
        Random rng = new Random(Seed());

        // O(n) for the loop alone : O(nm) total
        for (int n = 0; n < requested; n++)
        {
            // O(1) to choose an item : uses implicit ordering
            int choice = rng.Next(1, sumAll);
            int current = 0;

            // O(m) to find the chosen item
            foreach (Stock<T> stock in list)
            {
                current += stock.Available;

                if (current >= choice)
                {
                    yield return stock.Item;

                    // O(1) to re-calculate weight once item is found
                    stock.Available -= 1;
                    sumAll--;

                    break;
                }
            }
        }
    }

    // Sufficiently random seed
    private static int Seed()
    {
        byte[] bytes = new byte[4];
        new RNGCryptoServiceProvider().GetBytes(bytes);
        return bytes[0] << 24 | bytes[1] << 16 | bytes[2] << 8 | bytes[3];
    }
}

函数PickRandom()使用yield return和IEnumerable，但不是必需的。我只是在第一次编写函数时试图变得聪明，这样它就可以迭代任何东西（甚至可以说从LINQ to SQL查询中可以枚举）。之后，我发现虽然灵活性很好，但我从未真正需要它。

我首先考虑解决点＃1（保证从每个可能的选择中选择最小数量）将是以完全非随机的方式从每种类型中选择所需的最小值，使用我现有的算法来选择剩余的无约束的部分，然后将结果混合在一起。这似乎是最自然的，模仿我在现实生活中会做这样的事情，但我认为这不是最有效的方式。

我的第二个想法是首先创建一个结果数组，首先随机选择索引以填充所需的最小值，然后使用我现有的算法填充其余的数据，但在我的所有尝试中，这最终增加了“大O”的复杂性或者随处可见的大量索引。我仍然认为这种方法是可行的，我还没有完成它。

然后决定来这里，因为这个问题似乎可以被抽象为一个相当通用的算法，但我用来搜索的所有关键词通常都指向基本的加权随机数生成（而不是选择分组的离散项）具有特定可用性的类型）。并且无法找到任何限制每个项目类型的最小选择的问题，同时仍然保持随机化。所以我希望有人能够看到一个简单有效的解决方案，或者之前听过这个问题的人知道一些比我更好的关键词，并且可以指出我正确的方向。

Answer 1

这是一个粗略的想法;我相信它可以进一步改进：

假设每个可用项目在 [0..sumAll []范围内具有唯一ID，其中 sumAll 是可用项目数。所以第一个苹果的ID为0，最后一个苹果的ID为199，第一个苹果的ID为200，依此类推。确定 sumAll 和每种类型的子范围是 O（m）其中 m 是类型的数量。
选择一个随机ID（所有ID具有相同的权重）。重复此操作，直到您拥有一组10个不同的ID。这是 O（n），其中 n 是要选择的项目数。
使用二进制搜索确定每个已挑选ID的类型。这是 O（n log m）。
从可用项目中删除已挑选的项目。这是 O（m）。

为了保证为每种类型选择的最小数量的项目，在步骤1之前选择这些项目并从可用项目中删除它听起来是个好主意。这是 O（m）。

Answer 2

好问题。我认为O（mn）是基本情况，因为每个n（项目数）你需要重新评估加权（即O（m））。

Picker类似乎总是返回相同的类型 - 你不是在这里混合类型的股票。在您的示例中，Stock<string>。因此，Picker类可能会将您的所有库存压缩成一个列表 - 内存效率更低，计算效率更高。

public static IEnumerable<T> PickRandom(int requested, IEnumerable<Stock<T>> list)
{
    var allStock = list.SelectMany(item => 
        Enumerable.Range(0, item.Available).Select(r => item.Item)).ToList();

    Random rng = new Random(); 

    for (int n = 0; n < requested; n++) 
    { 
        int choice = rng.Next(0, allStock.Count - 1);

        var result = allStock[choice];
        allStock.RemoveAt(choice);

        yield return result;
    }  
}

这里的缺点是你没有改变原来的Stock个对象，但这是你可以做的一个实现（你的示例显示了作为Picker的匿名参数创建的Stock个对象）

修改

这是另一个与现有代码非常相似的示例。它将创建一个字典，您可以在其中查找您的选择。但同样，每次选择后需要重新评估字典（控制加权），导致O（mn）。

public static IEnumerable<T> PickRandom(int requested, IEnumerable<Stock<T>> list) { Random rng = new Random(); for (int n = 0; n < requested; n++) { int cumulative = 0; var items = list.ToDictionary(item => new { Start = cumulative, End = cumulative += item.Available }); int choice = rng.Next(0, cumulative - 1); var foundItem = items.Single(i => i.Key.Start <= choice && choice < i.Key.End).Value; foundItem.Available--; yield return foundItem.Item; } }

从逻辑上讲，是否可以在不考虑所有类别的情况下重新评估权重？

约束加权选择

2 个答案: