Question

我在Java中的应用程序需要一个哈希表来进行计算，并且必须在数据库中查找数百万个哈希表。哈希表必须能够非常快速地从磁盘读取到HashTable实用程序中，并且hast表中的数据是静态的，不需要插入或删除。

您是否建议使用任何可用的lib来执行此操作？

此外，数据大小小于200MB。

Answer 1

如果您的数据是静态的，为什么不使用普通的旧数组并按索引查找？无论您打算使用哪个key，只需提供index属性即可。当然，如果超过maximum possible array length，则需要在多个阵列之间进行分片。

我认为没有散列函数可以击败直接随机访问，并且在初始化期间而不是每次查找都会预先设置在您的密钥集上分配索引的成本（“完美散列函数”）。

Answer 2

如果不需要人类可读，您可以喘气求助于确保您的数据实现Serializable接口并使用ObjectOutputStream序列化HashMap。这很难看，但它可以完成工作。

另一个选项是DataInputStream和DataOutputStream。这些允许您读/写结构化二进制数据。

假设你有一个HashMap，你可以这样写：

// realOutputStream should probably be a BufferedOutputStream
DataOutputStream output = new DataOutputStream( realOutputStream );
for (Map.Entry<Long, String> entry : map.entrySet()) {
    // Write the key
    output.writeLong(entry.getKey().longValue());
    byte bytes[] = entry.getBytes("UTF-8");
    // Writing the string requires writing the length and then the bytes
    output.writeInt(bytes.length);
    output.write(bytes, 0, bytes.length);
}



// realInputStream should probably be a BufferedInputStream
DataInputStream input = new DataInputStream ( realInputStream );
Map<Long, String> map = new HashMap<Long, String>();
while ( true ) {
   try {
     // read the key
     long key = output.readLong();
     // read the string length in bytes
     int strlen = output.readInt();
     // read the bytes into an array
     byte buf[] = new byte[strlen];
     output.readFully(buf, 0, strlen);
     // Create the map entry.
     map.put(Long.valueOf(key), new String(buf,"UTF-8"));
   }
   catch (EOFException e) {
     // input is exhausted
     break;
   }
}

请记住，假设您要将字符串存储并读取为UTF。您可以轻松地不提供字符集并使用jvm默认编码。另请注意，像String这样长度可变的东西要求您在写入实际数据之前先写入该数据的长度。这样您就可以知道需要读取多少字节来重建该字符串。

快速静态持久哈希表

2 个答案: