Question

我有一个二进制文件，由标题行和二进制文件组成。 ftp://n5eil01u.ecs.nsidc.org/SAN/GLAS/GLA06.034/2003.02.21/GLA06_634_1102_001_0079_3_01_0001.DAT

我必须知道标题行占用的行数。我怎么能事先知道它，以便我可以将下面的值放在以避开标题部分。

import numpy as np    
fname = 'GLA06_634_1102_001_0079_3_01_0001.DAT' 

with open(fname,'rb') as fi:
    fi.seek (176,0) ##HERE I HAVE TO PUT

Answer 1

假设这是一个将文本与二进制分开的空行：

skiprows = 0
for line in open(file):
    if line != '\n'
        skiprows += 1
    else:
        break

with open(fname, 'rb') as fi:
    fi.seek(skiprows, 0)

Answer 2

FWIW，你的文件的hexdump显示“二进制数据”似乎从0x35c0开始：

00001a20  39 3b 0a 67 41 53 50 5f  74 31 3d 20 39 39 30 37  |9;.gASP_t1= 9907|
00001a30  39 32 30 30 2e 30 30 30  30 30 30 30 3b 0a 67 6c  |9200.0000000;.gl|
00001a40  6f 62 41 76 53 72 66 50  72 65 73 32 3d 20 38 39  |obAvSrfPres2= 89|
00001a50  30 35 38 2e 39 35 32 33  36 33 37 3b 0a 67 41 53  |058.9523637;.gAS|
00001a60  50 5f 74 32 3d 20 39 39  31 30 30 38 30 30 2e 30  |P_t2= 99100800.0|
00001a70  30 30 30 30 30 30 3b 0a  20 20 20 20 20 20 20 20  |000000;.        |
00001a80  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00001ae0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000035c0  01 53 05 b0 05 e8 11 30  00 01 0a de 08 0b 00 00  |.S.....0........|
000035d0  ff ff ff 52 00 00 61 a8  00 00 c3 50 00 01 24 f8  |...R..a....P..$.|
000035e0  00 01 86 a0 00 01 e8 48  00 02 49 f0 00 02 ab 98  |.......H..I.....|
000035f0  00 03 0d 40 00 03 6e e8  00 03 d0 90 00 04 32 38  |...@..n.......28|
00003600  00 04 93 e0 00 04 f5 88  00 05 57 30 00 05 b8 d8  |..........W0....|
00003610  00 06 1a 80 00 06 7c 28  00 06 dd d0 00 07 3f 78  |......|(......?x|
00003620  00 07 a1 20 00 08 02 c8  00 08 64 70 00 08 c6 18  |... ......dp....|

显然，二进制数据前面有一堆0x00。作为一种启发式方法，我们可能会尝试找到该部分：

fname = 'GLA06_634_1102_001_0079_3_01_0001.DAT'

with open(fname,'rb') as fi:
    while fi.read(1) != b'\x00': # skip text part
        pass
    while fi.read(1) == b'\x00': # skip 0x00
        pass

    # rewind 1 byte
    fi.seek(fi.tell()-1)

    print "Binary data starts at ", fi.tell()

有些警告：

你肯定应该在这里添加一些“错误处理”。
这是相当脆弱的，因为我对这种格式一无所知。
为了获得更强大的解决方案，您能找到一些规格或文件格式吗？

Answer 3

从提供的read file routine：

n_headers = long( read_header( i_file, 'NUMHEAD', error=error) )
recl= long( read_header( i_file, 'RECL', error=error) )
offset=long(recl*n_headers)
print,'offset=',offset
print,'recl   n_headers = ',recl,n_headers
str_vers = 'pv'+strtrim(string(ver1),2)+'_'+ $
             strtrim(string(ver2),2)
print, 'version=',str_vers

标题大小似乎为recl*n_headers，其中这两个值是前两个标题。所以：

fname = 'GLA06_634_1102_001_0079_3_01_0001.DAT'

with open(fname,'rb') as fi:
    recl = None
    numhead = None

    # Loop in case the required headers are not the first two one
    # and/or in wrong order
    for line in fi:
        if line.startswith('Recl='):
            recl = int(line[5:-2])
        if line.startswith('Numhead='):
            numhead = int(line[8:-2])

        if recl is not None and numhead is not None:
            break

    offset = recl*numhead

    print "Binary data starts at ", offset
    fi.seek(offset)

事先得到标题行数

3 个答案: