Python二进制数据读取

时间:2009-10-20 00:51:47

标签: python binary-data

urllib2请求接收二进制响应,如下所示:

00 00 00 01 00 04 41 4D 54 44 00 00 00 00 02 41
97 33 33 41 99 5C 29 41 90 3D 71 41 91 D7 0A 47
0F C6 14 00 00 01 16 6A E0 68 80 41 93 B4 05 41
97 1E B8 41 90 7A E1 41 96 8F 57 46 E6 2E 80 00
00 01 16 7A 53 7C 80 FF FF

其结构是:

DATA, TYPE, DESCRIPTION

00 00 00 01, 4 bytes, Symbol Count =1

00 04, 2 bytes, Symbol Length = 4

41 4D 54 44, 6 bytes, Symbol = AMTD

00, 1 byte, Error code = 0 (OK)

00 00 00 02, 4 bytes, Bar Count =  2

FIRST BAR

41 97 33 33, 4 bytes, Close = 18.90

41 99 5C 29, 4 bytes, High = 19.17

41 90 3D 71, 4 bytes, Low = 18.03

41 91 D7 0A, 4 bytes, Open = 18.23

47 0F C6 14, 4 bytes, Volume = 3,680,608

00 00 01 16 6A E0 68 80, 8 bytes, Timestamp = November 23,2007

SECOND BAR

41 93 B4 05, 4 bytes, Close = 18.4629

41 97 1E B8, 4 bytes, High = 18.89

41 90 7A E1, 4 bytes, Low = 18.06

41 96 8F 57, 4 bytes, Open = 18.82

46 E6 2E 80, 4 bytes, Volume = 2,946,325

00 00 01 16 7A 53 7C 80, 8 bytes, Timestamp = November 26,2007

TERMINATOR

FF FF, 2 bytes,

如何读取这样的二进制数据?

提前致谢。

更新

我在前6个字节上尝试使用struct module:

struct.unpack('ih', response.read(6))

(16777216,1024)

但它应该输出(1,4)。我看一下手册,但不知道出了什么问题。

6 个答案:

答案 0 :(得分:10)

所以这是我最好的解释你给出的数据......:

import datetime
import struct

class Printable(object):
  specials = ()
  def __str__(self):
    resultlines = []
    for pair in self.__dict__.items():
      if pair[0] in self.specials: continue
      resultlines.append('%10s %s' % pair)
    return '\n'.join(resultlines)

head_fmt = '>IH6sBH'
head_struct = struct.Struct(head_fmt)
class Header(Printable):
  specials = ('bars',)
  def __init__(self, symbol_count, symbol_length,
               symbol, error_code, bar_count):
    self.__dict__.update(locals())
    self.bars = []
    del self.self

bar_fmt = '>5fQ'
bar_struct = struct.Struct(bar_fmt)
class Bar(Printable):
  specials = ('header',)
  def __init__(self, header, close, high, low,
               open, volume, timestamp):
    self.__dict__.update(locals())
    self.header.bars.append(self)
    del self.self
    self.timestamp /= 1000.0
    self.timestamp = datetime.date.fromtimestamp(self.timestamp)

def showdata(data):
  terminator = '\xff' * 2
  assert data[-2:] == terminator
  head_data = head_struct.unpack(data[:head_struct.size])
  try:
    assert head_data[4] * bar_struct.size + head_struct.size == \
           len(data) - len(terminator)
  except AssertionError:
    print 'data length is %d' % len(data)
    print 'head struct size is %d' % head_struct.size
    print 'bar struct size is %d' % bar_struct.size
    print 'number of bars is %d' % head_data[4]
    print 'head data:', head_data
    print 'terminator:', terminator
    print 'so, something is wrong, since',
    print head_data[4] * bar_struct.size + head_struct.size, '!=',
    print len(data) - len(terminator)
    raise

  head = Header(*head_data)
  for i in range(head.bar_count):
    bar_substr = data[head_struct.size + i * bar_struct.size:
                      head_struct.size + (i+1) * bar_struct.size]
    bar_data = bar_struct.unpack(bar_substr)
    Bar(head, *bar_data)
  assert len(head.bars) == head.bar_count
  print head
  for i, x in enumerate(head.bars):
    print 'Bar #%s' % i
    print x

datas = '''
00 00 00 01 00 04 41 4D 54 44 00 00 00 00 02 41
97 33 33 41 99 5C 29 41 90 3D 71 41 91 D7 0A 47
0F C6 14 00 00 01 16 6A E0 68 80 41 93 B4 05 41
97 1E B8 41 90 7A E1 41 96 8F 57 46 E6 2E 80 00
00 01 16 7A 53 7C 80 FF FF
'''

data = ''.join(chr(int(x, 16)) for x in datas.split())
showdata(data)

这会发出:

symbol_count 1
 bar_count 2
    symbol AMTD
error_code 0
symbol_length 4
Bar #0
    volume 36806.078125
 timestamp 2007-11-22
      high 19.1700000763
       low 18.0300006866
     close 18.8999996185
      open 18.2299995422
Bar #1
    volume 29463.25
 timestamp 2007-11-25
      high 18.8899993896
       low 18.0599994659
     close 18.4629001617
      open 18.8199901581

...这似乎非常接近你想要的东西,除了一些输出格式细节。希望这会有所帮助! - )

答案 1 :(得分:6)

>>> data
'\x00\x00\x00\x01\x00\x04AMTD\x00\x00\x00\x00\x02A\x9733A\x99\\)A\x90=qA\x91\xd7\nG\x0f\xc6\x14\x00\x00\x01\x16j\xe0h\x80A\x93\xb4\x05A\x97\x1e\xb8A\x90z\xe1A\x96\x8fWF\xe6.\x80\x00\x00\x01\x16zS|\x80\xff\xff'
>>> from struct import unpack, calcsize
>>> scount, slength = unpack("!IH", data[:6])
>>> assert scount == 1
>>> symbol, error_code = unpack("!%dsb" % slength, data[6:6+slength+1])
>>> assert error_code == 0
>>> symbol
'AMTD'
>>> bar_count = unpack("!I", data[6+slength+1:6+slength+1+4])
>>> bar_count
(2,)
>>> bar_format = "!5fQ"                                                         
>>> from collections import namedtuple
>>> Bar = namedtuple("Bar", "Close High Low Open Volume Timestamp")             
>>> b = Bar(*unpack(bar_format, data[6+slength+1+4:6+slength+1+4+calcsize(bar_format)]))
>>> b
Bar(Close=18.899999618530273, High=19.170000076293945, Low=18.030000686645508, Open=18.229999542236328, Volume=36806.078125, Timestamp=1195794000000L)
>>> import time
>>> time.ctime(b.Timestamp//1000)
'Fri Nov 23 08:00:00 2007'
>>> int(b.Volume*100 + 0.5)
3680608

答案 2 :(得分:5)

>>> struct.unpack('ih', response.read(6))
(16777216, 1024)

您正在小端机器上解压缩大端数据。试试这个:

>>> struct.unpack('!IH', response.read(6))
(1L, 4)

这告诉unpack考虑网络顺序(big-endian)中的数据。此外,计数和长度的值不能为负数,因此您应该在格式字符串中使用无符号变体。

答案 3 :(得分:2)

查看struct module中的struct.unpack

答案 4 :(得分:1)

使用“struct”包中的pack / unpack函数。更多信息http://docs.python.org/library/struct.html

再见!

答案 5 :(得分:0)

正如已经提到的,struct是您需要使用的模块。

请阅读其文档以了解字节排序等。

在您的示例中,您需要执行以下操作(因为您的数据是big-endian和unsigned):

>>> import struct
>>> x = '\x00\x00\x00\x01\x00\x04'
>>> struct.unpack('>IH', x)
(1, 4)