Question

我遇到以下问题：单个值，可以与两个不同的键相关联。例如：

uint64 key1 - ＆gt;值
uint32 key2 - ＆gt;值

所以查询可以是双重的：

table.find（uint64 key1）或
table.find（uint32 key2）

key1和key2完全独立。是否有可能实现一个表通过两个键访问而不重复项目？

一种可能的解决方案（psedocode）：

class TwoKeyHashTable {
  Value find(uint64);
  Value find(uint32);

  insert(key1, key2, value) {
    insert_into_table(key1, value);
    insert_into_table(key2, value);
  }

  struct Item {
    uint64 key1;
    uint32 key2;
    Value value;
  } *table;
};

但是，此解决方案使表格中的项目数量翻倍。我有数以亿计的项目，我想将整个表保留在内存中，所以我想问是否存在更高效的内存？

Answer 1

哇，我很惊讶周围没有想法...： - / 所以我实现了表，并复制了所有项目，如下所示：

class TwoKeysHashTable {
 public:
  struct Item {
    uint64 key1;
    uint32 key2;
    int32 value;

    Item() : key1(0), key2(0) { }

    Item(uint64 k1, uint32 k2, int val)
       : key1(k1), key2(k2), value(val) { }
  };

  ...

  Item *Find(uint64 key) const;
  Item *Find(uint32 key) const;
  int Insert(uint64 key1, uint32 key2, int value);

 private:
  Item *FindKey1(uint64 key, int *return_index) const;
  Item *FindKey2(uint32 key, int *return_index) const;

  int GetNextIndex(int start_index, int counter) const;
  void Rehash();

  Item *array_;
  int number_key1_;
  int number_key2_;
  int allocated_items_;
};

为了不复制（真实）数据，它存储在压缩数组中，'Item :: value'是该数组的索引。 'Insert'调用'FindKey1'和'FindKey2'来查询表中是否已存在键，如果没有，则会在返回的索引处插入新的Items。我使用开放散列来保持阵列尽可能紧凑。尽管付出了很多努力，但是我的数据仍然使用了超过8GB的内存（而且我没有计算实际数据，'值'指向）。

任何想法如何更有效地实现内存？感谢...

如何使用2个密钥实现哈希表？

1 个答案: