将键长度值(KLV)字符串拆分为键,长度,值的小列表

时间:2019-05-06 09:17:52

标签: python

将KLV字符串拆分为键,长度,值作为元素的列表/元组的更有效方法是什么?

要添加一点背景,请使用前3位数字作为键,后2位表示值的长度。
我已经可以使用以下代码解决问题。但是我不认为我的代码和逻辑是完成任务的最有效方法。 因此,我很想听听其他意见,以便我变得更好。

result = []

def klv_split(ss):
    while True:
        group1 = ss[:3]
        group2 = ss[3:5]
        print(group2)
        group3 = ss[5 : 5 + int(group2)]
        result.append([group1, group2, group3])
        try:
            klv_split(ss[5 + int(group2) :])
        except ValueError:
            break
        break

    return result


klv_string = "0021571583400000026400412000000000200026047299049000850025003ADV25110Blahbleble25304677225400255002560204"
klv_split(klv_string)
print(result)

预期输出是键长度值如下的小列表。

[['002', '15', '715834000000264'], ['004', '12', '000000000200'], ['026', '04', '7299'], ['049', '00', ''], ['085', '00', ''], ['250', '03', 'ADV'], [
'251', '10', 'Blahbleble'], ['253', '04', '6772'], ['254', '00', ''], ['255', '00', ''], ['256', '02', '04']]

3 个答案:

答案 0 :(得分:1)

其他答案创建了递归函数的迭代版本。自Python does not optimize tail call recursion起,速度将会更快。

我将重点介绍您有一个巨大的二进制文件要解析的情况:

>>> def klvs(f):
...     while True:
...         k = f.read(3)
...         if not k:
...             return
...
...         k_length = f.read(2)
...         assert len(k_length) == 2
...         k_length = int(k_length)
...         value = f.read(k_length)
...         assert len(value) == k_length
...         yield (k, k_length, value)
...

创建迭代器更为方便(尽管可能不会更快)。我用了字节,因为那是您通常获得的klv数据:

>>> klv_bytes = b"0021571583400000026400412000000000200026047299049000850025003ADV25110Blahbleble25304677225400255002560204"
>>> import io
>>> f = io.BytesIO(klv_bytes)
>>> list(klvs(f))
[(b'002', 15, b'715834000000264'), (b'004', 12, b'000000000200'), (b'026', 4, b'7299'), (b'049', 0, b''), (b'085', 0, b''), (b'250', 3, b'ADV'), (b'251', 10, b'Blahbleble'), (b'253', 4, b'6772'), (b'254', 0, b''), (b'255', 0, b''), (b'256', 2, b'04')]

您可能希望通过键或索引来获取元素而不创建所有元组:

>>> import os
>>> def get(f, to_search):
...     i = 0
...     while True:
...         k = f.read(3)
...         if not k:
...             return None
...
...         k_length = f.read(2)
...         assert len(k_length) == 2
...         k_length = int(k_length)
...         if to_search(i, k):
...             value = f.read(k_length)
...             assert len(value) == k_length
...             return (k, k_length, value)
...         else:
...             f.seek(k_length, os.SEEK_CUR)
...         i += 1
...
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda _, k: k==b"004")
(b'004', 12, b'000000000200')
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda _, k: k=="foo") is None
True
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda i, _: i==10)
(b'256', 2, b'04')
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda i, _: i==11) is None
True

请注意,get函数为O(n),如果您要查找多个元素,则创建列表或字典会更快。

答案 1 :(得分:0)

使用尺寸上的信息进行操作:

def klv_split(ss):
    result = []
    while len(ss) != 0:
        group1 = ss[:3]
        group2 = ss[3:5]
        up_to = 5 + int(group2)
        group3 = ss[5:up_to]
        result.append((group1, group2, group3))
        ss = ss[up_to:]
    return result

结果:

[('002', '15', '715834000000264'), ('004', '12', '000000000200'), ('026', '04', '7299'), ('049', '00', ''), ('085', '00', ''), ('250', '03', 'ADV'), ('251', '10', 'Blahbleble'), ('253', '04', '6772'), ('254', '00', ''), ('255', '00', ''), ('256', '02', '04')]

这里有live example

答案 2 :(得分:0)

您可以在while循环中使用索引,而不是while True循环。

klv_string = "0021571583400000026400412000000000200026047299049000850025003ADV25110Blahbleble25304677225400255002560204"

def klv_split(ss):
    idx = 0
    result = []
    #Run till index is less than length of string
    while idx < len(ss):
        #Extract various groups using indexes
        group1 = ss[idx:idx+3]
        group2 = ss[idx+3:idx+5]
        group3 = ss[idx+5:idx+5 + int(group2)]
        result.append([group1, group2, group3])

        #Increment the index
        idx += 5+int(group2)
    return result

print(klv_split(klv_string))

输出将为

[['002', '15', '715834000000264'], 
['004', '12', '000000000200'], 
['026', '04', '7299'], 
'049', '00', ''], 
['085', '00', ''], 
['250', '03', 'ADV'], 
['251', '10', 'Blahbleble'], 
['253', '04', '6772'], 
['254', '00', ''], 
['255', '00', ''], 
['256', '02', '04']]