如何在Python中的ascii头之后读取二进制数据

时间:2011-02-05 00:13:09

标签: python file-io binary ascii

我有一些成像数据存储在一个文件中,该文件包含一个ascii文本标题,以空字符结尾,后跟二进制数据。 ascii标题的长度各不相同,我想知道打开文件的最佳方法是什么,读取标题并找到空字符,然后加载二进制数据(在Python中)。

感谢您的帮助,
詹姆斯

3 个答案:

答案 0 :(得分:1)

这样的事情是否有效:

with open('some_file','rb') as f:
  binary_data = f.read().split('\0',1)[1]

答案 1 :(得分:1)

可能应该从这样的事情开始。

with open('some file','rb') as input:
    aByte= input.read(1)
    while aByte and ord(aByte) != 0: aByte= input.read(1)
    # At this point, what's left is the binary data.

Python版本号对于这类事情很重要。问题是read函数的结果。某些版本可以返回字节(这是数字)。其他版本将返回字符串(需要ord(aByte))。

答案 2 :(得分:1)

其他人已经回答了你的方向问题,但我想我会加上这个。

在处理二进制数据时,我经常发现对子类file有用并添加各种用于读取/写入压缩二进制数据的说服方法。

对于简单的事情来说这太过分了,但是如果你发现自己解析了很多二进制文件格式,那么值得付出额外的努力来避免重复自己。

如果不出意外,希望它可以作为如何使用struct的有用示例。另外,这是从较旧的代码中提取的,并且非常很多python 2.x. Python 3.x以不同的方式处理这个问题(特别是字符串与字节)。

import struct
import array

class BinaryFile(file):
    """
    Automatically packs or unpacks binary data according to a format
    when reading or writing.
    """
    def __init__(self, *args, **kwargs):
        """
        Initialization is the same as a normal file object
        %s""" % file.__doc__
        super(BinaryFile, self).__init__(self, *args, **kwargs)

    def read_binary(self,fmt):
        """
        Read and unpack a binary value from the file based
        on string fmt (see the struct module for details).
        This will strip any trailing null characters if a string format is
        specified. 
        """
        size = struct.calcsize(fmt)
        data = self.read(size)
        # Reading beyond the end of the file just returns ''
        if len(data) != size:
            raise EOFError('End of file reached')
        data = struct.unpack(fmt, data)

        for item in data:
            # Strip trailing zeros in strings 
            if isinstance(item, str):
                item = item.strip('\x00')

        # Unpack the tuple if it only has one value
        if len(data) == 1: 
            data = data[0]

        return data

    def write_binary(self, fmt, dat):
        """Pack and write data to the file according to string fmt."""
        # Try expanding input arguments (struct.pack won't take a tuple)
        try: 
            dat = struct.pack(fmt, *dat) 
        except (TypeError, struct.error): 
            # If it's not a sequence (TypeError), or if it's a 
            # string (struct.error), don't expand.
            dat = struct.pack(fmt, dat) 
        self.write(dat)

    def read_header(self, header):
        """
        Reads a defined structure "header" consisting of a sequence of (name,
        format) strings from the file. Returns a dict with keys of the given
        names and values unpaced according to the given format for each item in
        "header".
        """
        header_values = {}
        for key, format in header:
            header_values[key] = self.read_binary(format)
        return header_values

    def read_nullstring(self):
        """
        Reads a null-terminated string from the file. This is not implemented
        in an efficient manner for long strings!
        """
        output_string = ''
        char = self.read(1)
        while char != '\x00':
            output_string += char
            char = self.read(1)
            if len(char) == 0:
                break
        return output_string

    def read_array(self, type, number):
        """
        Read data from the file and return an array.array of the given
        "type" with "number" elements
        """
        size = struct.calcsize(type)
        data = self.read(size * number)
        return array.array(type, data)