Question

这个问题与语言无关，但我特意寻找使用C ++ STL容器的解决方案。我有这样的结构。

struct User {
   int query_count;
   std::string user_id;
}

std::multiset<User> users; //currently using

我使用带有比较器的multiset对query_count进行排序。这允许我使用相同的query_count对多个条目进行排序。现在，如果我想避免在user_id上重复，我需要扫描数据并删除条目并创建一个新条目，取O（n）。我试图想办法在次线性时间内做到这一点。我正在考虑基于在user_id上排序的地图的解决方案，但是当我尝试找到最大的query_count时，我将不得不扫描所有数据。

编辑：要求是插入，删除，更新（删除/插入），获得最高的query_count，在子线性时间内找到user_id。

我更喜欢使用标准的stl容器，但是简单的修改就可以了。有没有办法达到我的要求？

要点：

答案的摘要是，使用ootb解决方案，我可以使用boost双向映射。

如果我坚持STL，那么它必须是两个地图一起使用的组合，仔细更新，每次插入用户。

Answer 1

这听起来像是一个提升的multi_index：http://www.boost.org/doc/libs/1_57_0/libs/multi_index/doc/tutorial/

您可以根据用户ID设置一个索引，以便轻松防止重复（基于此插入），然后在查询计数上设置另一个排序索引，以便轻松找到最大值。

Answer 2

来自boost的multi_index是要走的路。但是如果你想使用你自己的DataStructure使用基本的STL容器，那么我建议你创建一个内部有两个conatiner的类。

在地图中保留一个到SortedContainer的迭代器。这样你就可以在O（1）中删除和访问它（与查找unordered_map相同）。

X

struct User {
    int query_count;
    std::string user_id;
}


class UserQueryCountSomething
{
    typedef std::list<int> SortedContainer; //better to use a Stack or Heap here instead of list.
    SortedContainer  sortedQueryCount; //keep the query_count sorted here.
    typedef std::pair< User, typename SortedContainer::iterator>  UserPosition_T;//a pair of User struct and the iterator in list.
    typedef unordered_map  < std::string,  UserPosition_T > Map_T;  // Keep your User struct and the iterator here in this map, aginst the user_id.

    Map_T map_;

    public:

    Insert(User u)
    {
        //insert into map_ and also in sortedQueryCount
    }

    int getHighestQueryCount()
    {
        //return first element in sortedQueryCount.
    }

    Delete()
    {
        //find in map and delete.
        //get the iterator from the map's value type here.
        //delete from the sortedQueryCount using the iteartor.
    }
};
}

这可以作为您的起点。如果您有更多详情，请告诉我。

Answer 3

如果我们只需要最高计数，而不是其他计数等级，那么一种方法可能是明确地跟踪它。我们可以这样做

unordered_map<UserId, QueryCount>;
int max_query_count;

不幸的是，在某些操作中，例如当具有最大查询计数的用户被删除时，需要重新计算最大值。请注意，对于查询计数不是最大值的所有其他用户，删除它们不需要重新计算max_query_count。完成后的重新计算将是O(N)，它不符合＆＃34;子线性＆＃34;需求。对于许多用例而言，这可能已经足够了，因为可能不会频繁删除具有最大查询计数的用户。

但是，如果我们绝对想避免O(N)重新计算，那么我们可能会引入另一个容器

multimap<QueryCount, UserId>

将特定查询计数映射到用户集合。

在这种方法中，任何突变操作，例如添加，删除，更新，可能需要更新两个容器。这有点痛苦，但获得的是这种更新预计是对数的，例如， O(lg N)，即亚线性。

使用一些代码草图更新。注意我使用unordered_map和unordered_set代替multimap来进行计数到用户映射。既然我们真的不需要按计数排序，这可能没问题;如果不是，unordered_map可以简单地更改为map。

class UserQueryCountTracker {
 public:
  typedef std::string UserId;
  typedef int QueryCount;

  void AddUser(UserId id) {
    int new_count = -1;
    auto it = user_count_map_.find(id);
    if (it == user_count_map_.end()) {  // id does not exist
      new_count = 1;
      user_count_map_[id] = new_count;
      count_user_map_[new_count].insert(id);
    }
    else {                              // id exists
      const int old_count = it->second;
      new_count = old_count + 1;
      it->second = new_count;
      // move 'id' from old count to new count
      count_user_map_[old_count].erase(id);
      count_user_map_[new_count].insert(id);
    }
    assert(new_count != -1);
    if (new_count > max_query_count_) {
      max_query_count_ = new_count;
    }
  }

  const unordered_set<UserId>& UsersWithMaxCount() const {
    return count_user_map_[max_query_count_];
  }

 private:
  unordered_map<UserId, QueryCount> user_count_map_{};
  int max_query_count_{0};
  unordered_map<QueryCount, unordered_set<UserId>> count_user_map_{};
};

Answer 4

使用双向映射，其中用户标识为密钥，查询计数为值

#include <map>
#include <utility>
#include <functional>
template
<
    typename K, // key
    typename V, // value
    typename P = std::less<V>  // predicate
>
class value_ordered_map
{
private:
    std::map<K, V>         key_to_value_;
    std::multimap<V, K, P> value_to_key_;

public:
    typedef typename std::multimap<typename V, typename K, typename P>::iterator by_value_iterator;

    const V& value(const K& key) {
        return key_to_value_[key];
    }

    std::pair<by_value_iterator, by_value_iterator> keys(const V& value) {
        return value_to_key_.equal_range(value);
    }

    void set(const K& key, const V& value) {
        by_key_iterator it = key_to_value_.find(key);
        if (key_to_value_.end() != it) {
            std::pair<by_value_iterator, by_value_iterator> it_pair = value_to_key_.equal_range(key_to_value_[key]);
            while (it_pair.first != it_pair.second)
                if (it_pair.first->first == it->second) {
                    value_to_key_.erase(it_pair.first);
                    break;
                } else ++it_pair.first;
        }
        key_to_value_[key] = value;
        value_to_key_.insert(std::make_pair(value, key));
    }
};

排序数据结构，允许在排序键上重复，但在子线性时间内替换另一个键上的重复项

4 个答案: