将KLV字符串拆分为键,长度,值作为元素的列表/元组的更有效方法是什么?
要添加一点背景,请使用前3位数字作为键,后2位表示值的长度。
我已经可以使用以下代码解决问题。但是我不认为我的代码和逻辑是完成任务的最有效方法。
因此,我很想听听其他意见,以便我变得更好。
result = []
def klv_split(ss):
while True:
group1 = ss[:3]
group2 = ss[3:5]
print(group2)
group3 = ss[5 : 5 + int(group2)]
result.append([group1, group2, group3])
try:
klv_split(ss[5 + int(group2) :])
except ValueError:
break
break
return result
klv_string = "0021571583400000026400412000000000200026047299049000850025003ADV25110Blahbleble25304677225400255002560204"
klv_split(klv_string)
print(result)
预期输出是键长度值如下的小列表。
[['002', '15', '715834000000264'], ['004', '12', '000000000200'], ['026', '04', '7299'], ['049', '00', ''], ['085', '00', ''], ['250', '03', 'ADV'], [
'251', '10', 'Blahbleble'], ['253', '04', '6772'], ['254', '00', ''], ['255', '00', ''], ['256', '02', '04']]
答案 0 :(得分:1)
其他答案创建了递归函数的迭代版本。自Python does not optimize tail call recursion起,速度将会更快。
我将重点介绍您有一个巨大的二进制文件要解析的情况:
>>> def klvs(f):
... while True:
... k = f.read(3)
... if not k:
... return
...
... k_length = f.read(2)
... assert len(k_length) == 2
... k_length = int(k_length)
... value = f.read(k_length)
... assert len(value) == k_length
... yield (k, k_length, value)
...
创建迭代器更为方便(尽管可能不会更快)。我用了字节,因为那是您通常获得的klv数据:
>>> klv_bytes = b"0021571583400000026400412000000000200026047299049000850025003ADV25110Blahbleble25304677225400255002560204"
>>> import io
>>> f = io.BytesIO(klv_bytes)
>>> list(klvs(f))
[(b'002', 15, b'715834000000264'), (b'004', 12, b'000000000200'), (b'026', 4, b'7299'), (b'049', 0, b''), (b'085', 0, b''), (b'250', 3, b'ADV'), (b'251', 10, b'Blahbleble'), (b'253', 4, b'6772'), (b'254', 0, b''), (b'255', 0, b''), (b'256', 2, b'04')]
您可能希望通过键或索引来获取元素而不创建所有元组:
>>> import os
>>> def get(f, to_search):
... i = 0
... while True:
... k = f.read(3)
... if not k:
... return None
...
... k_length = f.read(2)
... assert len(k_length) == 2
... k_length = int(k_length)
... if to_search(i, k):
... value = f.read(k_length)
... assert len(value) == k_length
... return (k, k_length, value)
... else:
... f.seek(k_length, os.SEEK_CUR)
... i += 1
...
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda _, k: k==b"004")
(b'004', 12, b'000000000200')
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda _, k: k=="foo") is None
True
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda i, _: i==10)
(b'256', 2, b'04')
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda i, _: i==11) is None
True
请注意,get
函数为O(n),如果您要查找多个元素,则创建列表或字典会更快。
答案 1 :(得分:0)
使用尺寸上的信息进行操作:
def klv_split(ss):
result = []
while len(ss) != 0:
group1 = ss[:3]
group2 = ss[3:5]
up_to = 5 + int(group2)
group3 = ss[5:up_to]
result.append((group1, group2, group3))
ss = ss[up_to:]
return result
结果:
[('002', '15', '715834000000264'), ('004', '12', '000000000200'), ('026', '04', '7299'), ('049', '00', ''), ('085', '00', ''), ('250', '03', 'ADV'), ('251', '10', 'Blahbleble'), ('253', '04', '6772'), ('254', '00', ''), ('255', '00', ''), ('256', '02', '04')]
这里有live example
答案 2 :(得分:0)
您可以在while循环中使用索引,而不是while True
循环。
klv_string = "0021571583400000026400412000000000200026047299049000850025003ADV25110Blahbleble25304677225400255002560204"
def klv_split(ss):
idx = 0
result = []
#Run till index is less than length of string
while idx < len(ss):
#Extract various groups using indexes
group1 = ss[idx:idx+3]
group2 = ss[idx+3:idx+5]
group3 = ss[idx+5:idx+5 + int(group2)]
result.append([group1, group2, group3])
#Increment the index
idx += 5+int(group2)
return result
print(klv_split(klv_string))
输出将为
[['002', '15', '715834000000264'],
['004', '12', '000000000200'],
['026', '04', '7299'],
'049', '00', ''],
['085', '00', ''],
['250', '03', 'ADV'],
['251', '10', 'Blahbleble'],
['253', '04', '6772'],
['254', '00', ''],
['255', '00', ''],
['256', '02', '04']]