在Perl中将二进制数据拆分为字节数组

时间:2012-10-31 13:22:14

标签: perl split binary-data unpack

我基本上想要将二进制字符串转换为数组/字节列表(以便允许索引,并避免使用substr,因为它的语法使我复杂化),我想出了以下MWE

#!/usr/bin/env perl

use warnings;
use strict;

# Use open ':raw';      # Unknown PerlIO layer class ':raw'
use open IO => ':raw';

binmode(STDIN);
binmode(STDOUT);

# Create original 8-bit byte array/list
my @atmp = (0x80, 0x23, 0x14, 0x0d, 0x0a, 0x00, 0x00, 0x80, 0x43, 0x00, 0x00);

# Make a copy of portion
my @atmp2 = (0) x 2;
@atmp2[0..1] = @atmp[7..8];

# Print output
print "Copied atmp2 contents as hex: " . join(", ", unpack("H2"x2, pack("C"x2,@atmp2))) . "\n";
print "Copied atmp2 as ushort (16bit) int: " . unpack("S", pack("C"x2, @atmp2));
# doublecheck value by routing through printf with format specifier:
printf(" [%d]\n", unpack("S", pack("C"x2, @atmp2)));


# Now, the same data as string:
my $indata = "\x80\x23\x14\x0d\x0a\x00\x00\x80\x43\x00\x00";

# Create byte array (by converting string $indata to array/list with `split`)
my @btmp = split('',$indata);
print "lastindex: " . $#btmp . "\n";

# Make a copy of portion
my @btmp2 = (0) x 2;
@btmp2[0..1] = @btmp[7..8];

# Print output
print "Copied btmp2 contents as hex: " . join(", ", unpack("H2"x2, pack("C"x2,@btmp2))) . "\n";
print "Copied btmp2 as ushort (16bit) int: " . unpack("S", pack("C"x2, @btmp2));
# doublecheck value by routing through printf with format specifier:
printf(" [%d]\n", unpack("S", pack("C"x2, @btmp2)));

运行此代码的结果为:

$ perl test.pl
Copied atmp2 contents as hex: 80, 43
Copied atmp2 as ushort (16bit) int: 17280 [17280]
lastindex: 10
Argument "M-\0" isn't numeric in pack at test.pl line 38.
Argument "C" isn't numeric in pack at test.pl line 38.
Copied btmp2 contents as hex: 00, 00
Copied btmp2 as ushort (16bit) int: 0 [0]

如何使第二部分(btmp2)与第一部分(atmp2)的行为相同?

1 个答案:

答案 0 :(得分:3)

事实证明,当使用split时,它确实创建了一个与原始字符串具有相同字节的数组;但是,它似乎也将结果数组标记为“文本”,因此进一步处理失败并显示“Argument not numeric”。

答案只是将split行替换为使用unpack的行,而是:

- my @btmp = split('',$indata);
+ my @btmp = unpack('C*',$indata);

......之后,所有工作都按预期进行(两个打印输出都相同)。有趣的是,在这两种情况下,“lastindex”(对于从字符串派生的数组)将显示为10(这使我认为binmode可能有问题,这就是为什么所有这些语句都存在于码)。