Question

我有一个包含空格分隔值的巨大文件，格式如下：

key1 0.553 1.45 0.666
key2 2.66 1.77 0.001
...

我想通过使用Shelve（或您建议的任何其他最适合的模块）将此文件作为字典驱动。通过这种方式，我可以查询第一列作为键，结果将所有后续值作为列表查询，即

In [1]: with shelve.open("file") as db:
   ...:    print db["key2"]
   ...:
Out [1]: [2.66, 1.77, 0.001]

非常感谢您的支持。

Answer 1

评论：...有效的方法可以有效地检索靠近文件末尾的项目？

添加offset参数如果您将逻辑实现到class DictFloatReader，这可以自动化。

def __getitem__(self, item):
    offset = 0
    if isinstance(item, tuple):
        offset = item[1]
        item = item[0]

    self.fh.seek(offset)
# Usage
print(db["key2", 300*1024])

如果您的keys 预先排序，例如1,2,3,4或a，b，c，您可以使用btree搜索。这导致每key几乎相同的搜索时间。
切换到真实数据库文件格式，提供索引和随机访问。
将其保存在内存中，但您已经说明了：
“在内存中保留不是一个选项”

这将做你想要的，例如：

class DictFloatReader(object):
    def __init__(self, fpath):
        self.fpath = fpath
        self.fh = None

    def __enter__(self):
        self.fh = open(self.fpath)
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.fh.close()

    def __getitem__(self, item):
        self.fh.seek(0)
        for line in self.fh:
            if line.startswith(item):
                return [float(f) for f in line[:-1].split(' ')[1:]]

<强>用法

with DictFloatReader('file') as db:
    print(db["key2"])
    print(db["key1"])
    print(db["key2"])

<强>输出
  [2.66,1.77,0.001]
  [0.553,1.45,0.666]
  [2.66,1.77,0.001]

使用Python测试：3.4.2和2.7.9

如何用shelve驱动空格分隔值文件

1 个答案: