Question

我有一个带有双打的向量，我想要排名（实际上它是一个带有双成员的对象的向量，称为costs）。如果只有唯一值或忽略非唯一值，则没有问题。但是，我想使用非唯一值的平均排名。此外，我在SO上发现了一些关于排名的问题，但他们忽略了非独特的价值观。

例如，假设我们有（1,5,4,5,5），那么相应的等级应该是（1,4,2,4,4）。当我们忽略非唯一值时，等级为（1,3,2,4,5）。

当忽略非唯一值时，我使用了以下内容：

void Population::create_ranks_costs(vector<Solution> &pop)
{
  size_t const n = pop.size();

  // Create an index vector
  vector<size_t> index(n);
  iota(begin(index), end(index), 0);

  sort(begin(index), end(index), 
       [&pop] (size_t idx, size_t idy) { 
         return pop[idx].costs() < pop[idy].costs();
       });

  // Store the result in the corresponding solutions
  for (size_t idx = 0; idx < n; ++idx)
    pop[index[idx]].set_rank_costs(idx + 1);
}

有谁知道如何考虑非唯一值？我更喜欢使用std::algorithm，因为IMO会导致代码干净。

Answer 1

以下是向量的例程，问题的标题表明：

template<typename Vector>
std::vector<double> rank(const Vector& v)
{
    std::vector<std::size_t> w(v.size());
    std::iota(begin(w), end(w), 0);
    std::sort(begin(w), end(w), 
        [&v](std::size_t i, std::size_t j) { return v[i] < v[j]; });

    std::vector<double> r(w.size());
    for (std::size_t n, i = 0; i < w.size(); i += n)
    {
        n = 1;
        while (i + n < w.size() && v[w[i]] == v[w[i+n]]) ++n;
        for (std::size_t k = 0; k < n; ++k)
        {
            r[w[i+k]] = i + (n + 1) / 2.0; // average rank of n tied values
            // r[w[i+k]] = i + 1;          // min 
            // r[w[i+k]] = i + n;          // max
            // r[w[i+k]] = i + k + 1;      // random order
        }
    }
    return r;
}

一个工作示例见IDEone。

对于具有绑定（相等）值的等级，存在不同的约定（最小值，最大值，平均等级或随机顺序）。在最里面的for循环中选择其中一个（平均排名在统计中是常见的，在体育中是最低排名）。

请注意，平均排名可以是非整数（n+0.5）。我不知道，如果将整数级n舍入到您的应用程序是一个问题。

该算法很容易推广到用户定义的排序，例如pop[i].costs()，默认为std::less<>。

Answer 2

这样做的一种方法是使用multimap。

将项放在多图中，将对象映射到size_t s（初始值不重要）。您可以使用一行执行此操作（使用带有迭代器的ctor）。
循环（明文或使用algorithm中的任何内容）并指定0,1，...作为值。
循环显示不同的键。对于每个不同的密钥，请为密钥调用equal_range，并将其值设置为平均值（同样，您可以使用来自algorithm的内容）。

整体复杂度应为 Theta（n log（n）），其中 n 是向量的长度。

Answer 3

这些方面的东西：

size_t run_start = 0;
double run_cost = pop[index[0]].costs();
for (size_t idx = 1; idx <= n; ++idx) {
  double new_cost = idx < n ? pop[index[idx]].costs() : 0;
  if (idx == n || new_cost != run_cost) {
    double avg_rank = (run_start + 1 + idx) / 2.0;
    for (size_t j = run_start; j < idx; ++j) {
       pop[index[j]].set_rank_costs(avg_rank);
    }

    run_start = idx;
    run_cost = new_cost;
  }
}

基本上，您遍历排序的序列并识别相等值的运行（可能运行长度为1）。对于每个这样的运行，您计算其平均排名，并为运行中的所有元素设置它。

为double的vector创建排名

3 个答案: