Question

给定一个固定大小的对象数组A，假设这些对象的较小子集满足特定条件B。我想以大约相等的频率完成三个任务：

我希望能够在按索引访问A中的对象时随时更改当前不符合条件B的对象以符合条件B。
我希望能够通过索引访问A中的对象时，将当前满足条件B的对象更改为不再满足条件B。
我还希望能够仅从那些符合条件B的对象中选择一个随机对象。

所有任务应能够在恒定时间内或尽可能接近恒定时间完成，不依赖于A中对象的数量，也不依赖于A中对象的数量。对象符合条件B 。如果无法做到恒定的时间（我怀疑是这种情况），那么我要考虑到我前面提到的频率，尽快完成这两个过程。如果要重复执行这两次任务，哪种数据结构适合于这两项任务？

例如，以我下面的C ++实现为例。尽管定时部分（重复执行大量代码的代码部分）与A（总体）的整体大小无关，但时间复杂度线性地取决于B（bluetiles）（无论总体数量是否增加）或没有），严重降低了代码的速度。

#include <iostream>
#include <vector>
#include <chrono>
#include <cstdlib>
#include <algorithm>

using namespace std;

enum color {RED, GREEN, BLUE};
const int NUM_ATTEMPTS = 10000;
const int INITIAL_NUM_BLUE_TILES = 1000;
const int TOTAL_TILES = 1000000;

struct tile
{
  int color = RED;
};

struct room
{
  vector<tile> alltiles;
  vector<tile*> bluetiles;
  room(vector<tile> v) : alltiles(v) {}
};

int main()
{
  srand (time(NULL));

  // set up the initial room, time complexity here is irrelevant
  room myroom(vector<tile>(1*TOTAL_TILES));
  for(int i = 0; i < INITIAL_NUM_BLUE_TILES; i++)
  {
    myroom.alltiles[i].color = BLUE;
    myroom.bluetiles.push_back(&myroom.alltiles[i]);
  }

  auto begin = std::chrono::high_resolution_clock::now();
  for(int attempt_num = 0; attempt_num < NUM_ATTEMPTS; attempt_num++)
  {
    // access a BLUE tile by index from alltiles to change its color to RED
    myroom.alltiles[5].color = RED; // constant time
    myroom.bluetiles.erase(std::remove(myroom.bluetiles.begin(), myroom.bluetiles.end(), &myroom.alltiles[5]), myroom.bluetiles.end()); // linear time, oh no!

    // access a RED tile by index from alltiles to change its color to BLUE
    myroom.alltiles[5].color = BLUE; // constant time
    myroom.bluetiles.push_back(&myroom.alltiles[5]); // constant time

    // randomly choose from ONLY the blue tiles
    int rand_index = rand() % myroom.bluetiles.size(); // constant time
    myroom.bluetiles[rand_index]->color = GREEN; // constant time
    myroom.bluetiles[rand_index]->color = BLUE; // constant time
    // so now I have constant time access to a random blue tile

  }
  auto end = std::chrono::high_resolution_clock::now();
  double runtime = std::chrono::duration_cast<std::chrono::milliseconds>(end-begin).count();
  cout << runtime << " ms" << endl; 
  return 0;
}

正在计时的部分是我感兴趣的经常执行的操作；在实际程序中，选择更改哪些图块的逻辑不同。希望更好的数据结构不需要任何概率分析，但我担心仍然需要。

我怀疑，也许通过在tile类中保留一个指针（指向bluetiles向量中的元素）来使用双重间接访问可能会使我在恒定时间内实现这一目标，但我不确定。我猜它至少可以从某种意义上加速它，即不再需要搜索bluetiles，但是从bluetiles中删除元素仍然是线性时间（因为我正在使用向量），所以我真的只是不知道该怎么办。

您能设计出最快的数据结构来实现此目标，并从我的示例中提供C ++实现的构建吗？还是我所能拥有的一切都会很好？

Answer 1

更新：这类似于我为SO问题Random element from unordered_set in O(1)

提出的解决方案

您可以实现类似于以下SubsetVector<T>类的方法，该类使您可以从O（1）的子集中插入/删除元素（即标记它们）。然后，您可以在O（1）中找到子集的大小，并从O（1）中的该子集访问第i个项目。我想这就是你想要的。请注意，该子集不保证任何特定的顺序，但这可以满足您的需求。

这个想法是维持两个向量。

m_entries包含实际数据。 m_entries[i]包含元素和一个指向m_subset_indices的索引（如果元素位于子集中，则为-1。
m_subset_indices包含子集中的m_entries个元素的所有索引。

以下是代码（已编译但未经测试）：

template <class T>
class SubsetVector
{
private:
   struct Entry
   {
       T element;
       int index_in_subset = -1;
   };
public:
   explicit SubsetVector(unsigned size = 0) : m_entries(size) 
   {
       m_subset_indices.reserve(size);
   }

   void push_back(const T & element)
   {
       m_entries.push_back(Entry{element, -1});
   }
   const T & operator[](unsigned index) const { return m_entries[index].element; }
   T & operator[](unsigned index) { return m_entries[index].element; }

   void insert_in_subset(unsigned index)
   {
       if (m_entries[index].index_in_subset < 0) {
           m_entries[index].index_in_subset = m_subset_indices.size();
           m_subset_indices.push_back(index);
       }
   }
   void erase_from_subset(unsigned index)
   {
       if (m_entries[index].index_in_subset >= 0) {
           auto subset_index = m_entries[index].index_in_subset;
           auto & entry_to_fix = m_entries[m_subset_indices.back()];
           std::swap(m_subset_indices[subset_index], m_subset_indices.back());
           entry_to_fix.index_in_subset = subset_index;
           m_subset_indices.pop_back();
           m_entries[index].index_in_subset = -1;
       }
   }
   unsigned subset_size() const 
   {
       return m_subset_indices.size();
   }
   T & subset_at(unsigned subset_index)
   {
       auto index = m_subset_indices.at(subset_index);
       return m_entries.at(index).element;
   }
   const T & subset_at(unsigned subset_index) const
   {
       auto index = m_subset_indices.at(subset_index);
       return m_entries.at(index).element;
   }

private:
   std::vector<Entry> m_entries;
   std::vector<unsigned> m_subset_indices;
};

允许通过迭代和从子集（C ++）中随机选择进行更改的数据结构

1 个答案: