Question

我在理解以下问题时遇到了问题：

在text.txt上创建一个哈希索引，其中ids为键，全文记录为数据。

 text.txt
 000000010:<status> <id>000000010</id> <created_at>2012/03/11</created_at> <text>@joerogan Played as Joe Savage Rogan in Undisputed3 Career mode, won Pride GP, got UFC title shot against Shields, lost 3 times, and retired</text> <retweet_count>0</retweet_count> <user> <name>Siggi Eggertsson</name> <location>Berlin, Germany</location> <description></description> <url>http://www.siggieggertsson.com</url> </user> </status>
 000000011:<status> <id>000000011</id> <created_at>2012/03/11</created_at> <text>Cat and Metronome: http://t.co/3Z7Aq8Dn</text> <retweet_count>3</retweet_count> <user> <name>Siggi Eggertsson</name> <location>Berlin, Germany</location> <description></description> <url>http://www.siggieggertsson.com</url> </user> </status>
 ...

我不确定我应该做什么。

我应该制作另一个用于存储哈希索引的txt文件吗？看起来id对于每一行都是唯一的，在这种情况下我甚至不需要散列。我可以使用db_load命令执行此操作吗？

提前感谢您的帮助！

Answer 1

索引的目的是加速对一组数据的查找。所以在这种情况下，我希望能够使用您的索引快速查找文本文件中的记录。假设索引由一个由记录id组成的元组以及相应记录开始的文件中的偏移量组成。

最好将索引存储在一个单独的文件中 - 您可以给它一个与被索引的文件匹配的名称（例如text.idx）。

文本文件上的哈希索引

1 个答案: