Question

我有一个二进制文件，其中有4 KB的头信息，然后是28个字节的数据，然后是24个字节，我想读取。我如何循环每24和28字节并读取（或提取）那28和24字节的每个前8字节数据..在python我做了类似的事情。不知道如何处理可变长度

import sys
import struct
f = open(sys.argv[1],"rb")
f.seek(4096)
byte = f.read(28)
while byte != "":   
    ticks = struct.unpack("<ll",byte[:8]) #not sure how to read 8 bytes 
    byte = f.read(28)
f.close()

这是标题后的样子。

Length
(bytes) Field Name
8   TS_INCR
4   SEQID
2   OP
2   LUN
4   NBLKS
8   LBA


Length
(bytes) Field Name
8   TS_INCR
4   SEQID
2   OP
2   LUN
4   LATENCY_TICKS
2   HOST_ID
2   HOST_LUN

如果你们能帮忙解决这个问题。 Python或PERL并不重要。感谢!!!!

Answer 1

您正在阅读的Endianness数据在此处很重要。你似乎正在解压缩8个八位组，因为两个长点以小端顺序存储。您确定它不是一个64位数量（这会使q或Q格式更合适吗？不幸的是，我使用的是32位计算机，因此我的perl不支持Q。

但是，以下内容应指向正确的方向：

#!/usr/bin/env perl

use strict; use warnings;
use autodie;

use Fcntl qw(:seek);
use List::Util qw( sum );

my ($input_file) = @ARGV;
die "Need input file\n" unless defined $input_file;

my $HEADER_SIZE = 4_096;

my @typedef = (
    {
        fields => [
            qw(
                TS_INCR_LO
                TS_INCR_HI
                SEQID
                OP
                LUN
                NBLKS
                LBA_LO
                LBA_HI
            )
        ],
        tmpl => 'LLLSSLLL',
        start => 0,
        size => 28,
    },
    {
        fields => [
            qw(
                TS_INCR_LO
                TS_INCR_HI
                SEQID
                OP
                LUN
                LATENCY_TICKS
                HOST_ID
                HOST_LUN
            )
        ],
        tmpl => 'LLLSSLSS',
        start => 28,
        size => 24,
    },
);

open my $input, '<:raw', $input_file;

seek $input, $HEADER_SIZE, SEEK_SET;

my $BLOCK_SIZE = sum map $_->{size}, @typedef;
read $input, my($buffer), $BLOCK_SIZE;

my @structs;

for my $t ( @typedef ) {
    my %struct;
    @struct{ @{ $t->{fields}} } = unpack(
        $t->{tmpl},
        substr($buffer, $t->{start}, $t->{size})
    );
    push @structs, \%struct;
}

use Data::Dumper;
print Dumper \@structs;

Answer 2

我想我每个循环读取52个字节（24+28==52）并简单地索引到你关心的字节。它看起来像这样：

byte = f.read(52)
while byte != "":   
    ticks = struct.unpack("<ll",byte[0:8])
    tocks = struct.unpack("<ll",byte[28:36])
    byte = f.read(52)

请注意，我不知道while byte != ""是否是此案例的惯用循环。我只是建议阅读更大的块，只解析你感兴趣的字节。操作系统级read()操作非常慢，将它们减半将大约是应用程序速度的两倍。如果你改为阅读更大的数据块，你肯定可以获得更大的加速 - 但这可能需要比这个微小的改变更多的重写。

如何从二进制文件中读取块并使用Python或Perl使用unpack提取结构？

2 个答案: