Question

在我的场景中，我需要允许随机访问使用msgpack序列化的单个项目。即给定一个二进制文件和一个项目索引，我想跳到文件中的确切位置并反序列化此项目。

要获取每个项目的字节偏移量，请使用unpack的{{1}}函数。在mgspack 0.5中，mgspack.Unpacker接受一个可选参数unpack，这是一个在序列化之前在原始数据字符串上调用的钩子。计算此字符串的write_bytes可以得到项目的大小（以字节为单位），从而可以累积字节偏移量。

自从msgpack 0.6开始，len参数不再被接受，并且我没有找到任何替代方法，使我无法使用 raw 输入字符串或读取项目后消耗的字节数。

这是我用来创建索引的函数。该函数将索引作为字节偏移量列表返回。每个条目write_bytes包含项index[i]的字节偏移量。关键部分是i调用，该调用不再接受任何属性。

unpacker.unpack(write_bytes=hook)

def index_from_recording(filename): # create empty index index = [] # hook that keeps track of the byte offset of the `msgpack.Unpacker` hook = ByteOffsetHook() with open(filename, "rb") as f: # create the `msgpack.Unpacker` unpacker = msgpack.Unpacker(f) try: while True: # add current offset to index index.append(hook.offset) # unpack (and discard) next item. # The `hook` keeps track of the read bytes unpacker.unpack(write_bytes=hook) # <== `write_bytes` not accepted since 0.6 except msgpack.OutOfData: pass return index的定义如下。该钩子仅计算原始输入字符串的ByteOffsetHook并对其进行累加。

len

对于调试，您可以使用此功能来生成虚拟录音。

class ByteOffsetHook(object):
    def __init__(self):
        self.offset = 0

    def __call__(self, data):
        self.offset += len(data)

Answer 1

我发现tell方法正在返回Unpacker的当前字节偏移量。但是，在我可以找到的latest documentation中没有描述此行为。同样，write_bytes参数并未声明为commit中已删除该参数的地方。

现在，用于创建索引的工作函数如下：

def index_from_recording(filename):
    # create empty index
    index = []

    with open(filename, "rb") as f:
        # create the `msgpack.Unpacker`
        unpacker = msgpack.Unpacker(f)
        try:
            while True:
                # add current offset to index
                index.append(unpacker.tell())

                # unpack (and discard) next item
                unpacker.unpack()
        except msgpack.OutOfData:
            pass

    return index

在msgpack 0.6中获取字节偏移

1 个答案: