我有一个二进制文件,由标题行和二进制文件组成。 ftp://n5eil01u.ecs.nsidc.org/SAN/GLAS/GLA06.034/2003.02.21/GLA06_634_1102_001_0079_3_01_0001.DAT
我必须知道标题行占用的行数。我怎么能事先知道它,以便我可以将下面的值放在以避开标题部分。
import numpy as np
fname = 'GLA06_634_1102_001_0079_3_01_0001.DAT'
with open(fname,'rb') as fi:
fi.seek (176,0) ##HERE I HAVE TO PUT
答案 0 :(得分:2)
假设这是一个将文本与二进制分开的空行:
skiprows = 0
for line in open(file):
if line != '\n'
skiprows += 1
else:
break
with open(fname, 'rb') as fi:
fi.seek(skiprows, 0)
答案 1 :(得分:2)
FWIW,你的文件的hexdump显示“二进制数据”似乎从0x35c0开始:
00001a20 39 3b 0a 67 41 53 50 5f 74 31 3d 20 39 39 30 37 |9;.gASP_t1= 9907|
00001a30 39 32 30 30 2e 30 30 30 30 30 30 30 3b 0a 67 6c |9200.0000000;.gl|
00001a40 6f 62 41 76 53 72 66 50 72 65 73 32 3d 20 38 39 |obAvSrfPres2= 89|
00001a50 30 35 38 2e 39 35 32 33 36 33 37 3b 0a 67 41 53 |058.9523637;.gAS|
00001a60 50 5f 74 32 3d 20 39 39 31 30 30 38 30 30 2e 30 |P_t2= 99100800.0|
00001a70 30 30 30 30 30 30 3b 0a 20 20 20 20 20 20 20 20 |000000;. |
00001a80 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
*
00001ae0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000035c0 01 53 05 b0 05 e8 11 30 00 01 0a de 08 0b 00 00 |.S.....0........|
000035d0 ff ff ff 52 00 00 61 a8 00 00 c3 50 00 01 24 f8 |...R..a....P..$.|
000035e0 00 01 86 a0 00 01 e8 48 00 02 49 f0 00 02 ab 98 |.......H..I.....|
000035f0 00 03 0d 40 00 03 6e e8 00 03 d0 90 00 04 32 38 |...@..n.......28|
00003600 00 04 93 e0 00 04 f5 88 00 05 57 30 00 05 b8 d8 |..........W0....|
00003610 00 06 1a 80 00 06 7c 28 00 06 dd d0 00 07 3f 78 |......|(......?x|
00003620 00 07 a1 20 00 08 02 c8 00 08 64 70 00 08 c6 18 |... ......dp....|
显然,二进制数据前面有一堆0x00
。作为一种启发式方法,我们可能会尝试找到该部分:
fname = 'GLA06_634_1102_001_0079_3_01_0001.DAT'
with open(fname,'rb') as fi:
while fi.read(1) != b'\x00': # skip text part
pass
while fi.read(1) == b'\x00': # skip 0x00
pass
# rewind 1 byte
fi.seek(fi.tell()-1)
print "Binary data starts at ", fi.tell()
有些警告:
答案 2 :(得分:2)
从提供的read file routine:
n_headers = long( read_header( i_file, 'NUMHEAD', error=error) )
recl= long( read_header( i_file, 'RECL', error=error) )
offset=long(recl*n_headers)
print,'offset=',offset
print,'recl n_headers = ',recl,n_headers
str_vers = 'pv'+strtrim(string(ver1),2)+'_'+ $
strtrim(string(ver2),2)
print, 'version=',str_vers
标题大小似乎为recl*n_headers
,其中这两个值是前两个标题。所以:
fname = 'GLA06_634_1102_001_0079_3_01_0001.DAT'
with open(fname,'rb') as fi:
recl = None
numhead = None
# Loop in case the required headers are not the first two one
# and/or in wrong order
for line in fi:
if line.startswith('Recl='):
recl = int(line[5:-2])
if line.startswith('Numhead='):
numhead = int(line[8:-2])
if recl is not None and numhead is not None:
break
offset = recl*numhead
print "Binary data starts at ", offset
fi.seek(offset)