有效地将逗号分隔值字符串转换为字节

时间:2016-06-15 23:01:11

标签: performance python-3.x encode

我的python3程序以下列格式从其他地方接收数据作为字符串(...表示我需要输入更多数据):

data = "0,12,145,234;1,0,0,128;2,255,255,255;...;909,100,100,100;"

我想将此转换为压缩二进制数据,而忽略,;个字符。目前,我正在做以下事情:

splitData = data.split(';')[:-1] # ignore the last ';'
buff = []
for item in splitData:
    addr, R, G, B = item.split(',')
    addr = int(addr) # two bytes
    R    = int(R)    # one byte
    G    = int(G)    # one byte
    B    = int(B)    # one byte
    packed = struct.pack('HBBB', addr, R, G, B)
    buff.append(packed)
dataBytes = b''.join(buff)

对于上面的示例数据,此过程为我提供了以下内容:

dataBytes = b'\x00\x00\x0c\x91\xea\x01\x00\x00\x00\x80...\x8d\x03ddd'

这就是我想要的(大约是原始字符串大小的三分之一)。

但是,此过程大约需要0.002秒。我需要每帧执行33次此过程,这导致计算大约0.05秒,达到每秒约20帧。如果可能的话,我想加快速度。

有没有办法将字符串数据转换为比上述方法更快的字节数据?

1 个答案:

答案 0 :(得分:2)

使用itertools,进行替换然后拆分,映射到int,最后用四肢拉伸速度提高约25%:

 In [82]: data = "0,12,145,234;1,0,0,128;2,255,255,255;909,100,100,100;" * 1000
 In [83]: from itertools import  imap, izip
 [84]: %%timeit  
splitData = data.split(';')[:-1] # ignore the last ';'
buff = []
for item in splitData:
    addr, R, G, B = item.split(',')
    addr = int(addr) # two bytes
    R    = int(R)    # one byte
    G    = int(G)    # one byte
    B    = int(B)    # one byte
    packed = struct.pack('HBBB', addr, R, G, B)
    buff.append(packed)
dataBytes = b''.join(buff)
   ....: 
100 loops, best of 3: 8.61 ms per loop

In [85]: %%timeit     
mapped = imap(int, data[:-1].replace(";", ",").split(","))
b"".join([struct.pack('HBBB', *sub) for sub in izip(mapped, mapped, mapped, mapped)])
   ....: 
100 loops, best of 3: 6.27 ms per loop

使用python3,只需使用map和zip:

In [4]: %%timeit
mapped = map(int, data[:-1].replace(";", ",").split(","))
b"".join([struct.pack('HBBB', *sub) for sub in zip(mapped, mapped, mapped, mapped)])
   ...: 
100 loops, best of 3: 3.61 ms per loop

In [5]: %%timeit        
splitData = data.split(';')[:-1] # ignore the last ';'
buff = []                                                                  for item in splitData:
    addr, R, G, B = item.split(',')
    addr = int(addr) # two bytes
    R    = int(R)    # one byte
    G    = int(G)    # one byte
    B    = int(B)    # one byte
    packed = struct.pack('HBBB', addr, R, G, B)
    buff.append(packed)
dataBytes = b''.join(buff)
   ...: 
100 loops, best of 3: 4.89 ms per loop