我正在尝试计算协议缓冲区对象的确切大小。
我通过以下链接:How do I determine the size of an object in Python?和https://goshippo.com/blog/measure-real-size-any-python-object/
但是协议缓冲区对象在dir(object)中不包含 dict ,因为它可能导致试图手动向其添加参数的人们造成损坏。这是基于我的理解,尽管可能并不完整或不正确。
所以,我从这个协议缓冲区消息定义开始
syntax = "proto2";
package test;
message Inner {
optional bytes inner_id = 1;
optional string inner_name = 2;
optional int64 inner_value = 3;
}
message Outer {
optional bytes uuid = 1;
optional string name = 2;
enum Test {
kOne = 1;
kTwo = 2;
}
optional Test testing = 3;
repeated Inner inner_list = 4;
}
这是示例用法
import uuid
from test_pb2 import Inner, Outer
x = Outer()
x.uuid = uuid.uuid4().bytes
x.name = "test"
x.testing = Outer.kOne
x.inner_list.add(inner_id=uuid.uuid4().bytes, inner_name="ok1", inner_value=1)
x.inner_list.add(inner_id=uuid.uuid4().bytes, inner_name="ok2", inner_value=2)
x.inner_list.add(inner_id=uuid.uuid4().bytes, inner_name="ok3", inner_value=3)
print id(x.inner_list)
print id(x.inner_list[0].inner_id)
print id(x.inner_list[1].inner_id)
print id(x.inner_list[2].inner_id)
print id(x.inner_list[0].inner_name)
print id(x.inner_list[1].inner_name)
print id(x.inner_list[2].inner_name)
print id(x.inner_list[0].inner_value)
print id(x.inner_list[1].inner_value)
print id(x.inner_list[2].inner_value)
inner_id,inner_name和inner_value的ID相同,即使它们属于不同的列表并且具有不同的值。
因此,上述链接中的代码修改未按预期进行
def get_size(obj, seen=None):
"""Recursively finds size of objects"""
size = sys.getsizeof(obj)
if seen is None:
seen = set()
obj_id = id(obj)
if obj_id in seen:
return 0
# Important mark as seen *before* entering recursion to gracefully handle
# self-referential objects
seen.add(obj_id)
if isinstance(obj, dict):
size += sum([get_size(v, seen) for v in obj.values()])
size += sum([get_size(k, seen) for k in obj.keys()])
elif hasattr(obj, '__dict__'):
size += get_size(obj.__dict__, seen)
elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
size += sum([get_size(i, seen) for i in obj])
else:
try:
for desc, _ in obj.ListFields():
if desc.label == FieldDescriptor.LABEL_REPEATED:
size += sum([get_size(i, seen) for i in getattr(obj, desc.name)])
else:
size += get_size(getattr(x, desc.name), seen)
except Exception as ex:
pass
return size
由于它在id检查时跳闸(见obj_id),并且没有考虑例如“ ok1”和“ ok2”之间的内存需求
任何人都可以解释相同“ id”之间的原因以及如何正确计算协议缓冲区的大小吗?
谢谢。