获取协议缓冲区对象的实际大小

时间:2019-08-19 19:25:05

标签: python protocol-buffers

我正在尝试计算协议缓冲区对象的确切大小。

我通过以下链接:How do I determine the size of an object in Python?https://goshippo.com/blog/measure-real-size-any-python-object/

但是协议缓冲区对象在dir(object)中不包含 dict ,因为它可能导致试图手动向其添加参数的人们造成损坏。这是基于我的理解,尽管可能并不完整或不正确。

所以,我从这个协议缓冲区消息定义开始

syntax = "proto2";

package test;

message Inner {
  optional bytes inner_id = 1;
  optional string inner_name = 2;
  optional int64 inner_value = 3;
}

message Outer {
  optional bytes uuid = 1;
  optional string name = 2;
  enum Test {
    kOne = 1;
    kTwo = 2;
  }
  optional Test testing = 3;
  repeated Inner inner_list = 4;
}

这是示例用法

import uuid
from test_pb2 import Inner, Outer

x = Outer()
x.uuid = uuid.uuid4().bytes
x.name = "test"
x.testing = Outer.kOne
x.inner_list.add(inner_id=uuid.uuid4().bytes, inner_name="ok1", inner_value=1)
x.inner_list.add(inner_id=uuid.uuid4().bytes, inner_name="ok2", inner_value=2)
x.inner_list.add(inner_id=uuid.uuid4().bytes, inner_name="ok3", inner_value=3)

print id(x.inner_list)
print id(x.inner_list[0].inner_id)
print id(x.inner_list[1].inner_id)
print id(x.inner_list[2].inner_id)
print id(x.inner_list[0].inner_name)
print id(x.inner_list[1].inner_name)
print id(x.inner_list[2].inner_name)
print id(x.inner_list[0].inner_value)
print id(x.inner_list[1].inner_value)
print id(x.inner_list[2].inner_value)

inner_id,inner_name和inner_value的ID相同,即使它们属于不同的列表并且具有不同的值。

因此,上述链接中的代码修改未按预期进行

def get_size(obj, seen=None):
    """Recursively finds size of objects"""
    size = sys.getsizeof(obj)
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    # Important mark as seen *before* entering recursion to gracefully handle
    # self-referential objects
    seen.add(obj_id)
    if isinstance(obj, dict):
        size += sum([get_size(v, seen) for v in obj.values()])
        size += sum([get_size(k, seen) for k in obj.keys()])
    elif hasattr(obj, '__dict__'):
        size += get_size(obj.__dict__, seen)
    elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum([get_size(i, seen) for i in obj])
    else:
        try:
            for desc, _ in obj.ListFields():
                if desc.label == FieldDescriptor.LABEL_REPEATED:
                    size += sum([get_size(i, seen) for i in getattr(obj, desc.name)])
                else:
                    size += get_size(getattr(x, desc.name), seen)
        except Exception as ex:
            pass
    return size

由于它在id检查时跳闸(见obj_id),并且没有考虑例如“ ok1”和“ ok2”之间的内存需求

任何人都可以解释相同“ id”之间的原因以及如何正确计算协议缓冲区的大小吗?

谢谢。

0 个答案:

没有答案