Question

我正在解析一个如下所示的文本文件：

ABCD
EFGH
IJKL

MNOP
QRST
UVWX

是否有可能以一种导致两个4x3阵列的方式在Perl中解析它？例如，array1[2][2] = K and array2[0][1] = N。代码：

#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;

open(FH, '<', 'gwas.txt') or die "Couldn't open file $!";

while(<FH>) {

    #parse file into 2 arrays
}
close(FH);

Answer 1

评论中解释的程序，浓缩

my @matrix = map { [ split '', $_ ]  } <$fh>;

list context中的菱形运算符 <>会返回所有行（请参阅I/O operators），其中每一行都由map中的块处理并将返回的列表分配给@matrix。

在块split中将每一行（$_）分成字符（''），anonymous array由该列表组成（[...]）。鉴于split的默认值，可以将其写为map { [ split '' ] }。

始终使用词法文件句柄，这样更好

my $file = 'gwas.txt';
open my $fh, '<', $file or die "Couldn't open $file: $!";

正如评论中所指出的，这会将整个文件处理成一个数组。要处理两个文本块，每个文本块都有自己的数组，我们可以将其写为循环（并使用空行来区分块）。

my @matrix;
my $index = 0;    
while (<$fh>) {
    $matrix[$index++] = [ split '', $_ ];
}

这使得带有线元素的匿名数组 [ ... ]并将其分配给数组$index中的@matrix点（并递增索引）。另一种方法是

my @row = split '', $_;
$matrix[$index++] = \@row;

在每次迭代时构造一个新数组，并为其分配引用。

然后我们需要使用空行来区分块。我们还需要管理这两个数组，通过在另一个数据结构（例如数组）中引用数组（矩阵）来做得很好。

use warnings;
use strict;
use Data::Dump qw(dd);

my $matrices;  # will be an arrayref, for references to matrices

my $file = 'matrices.txt';
open my $fh, '<', $file or die "Can't open $file: $!";

my @matrix;
my $index = 0;   
while (<$fh>) {
    chomp;

    if (/^\s*$/) {                     # blank line, done with one matrix
        $index = 0;                    # reset index
        push @$matrices, [ @matrix ];  # store anonymous array for @matrix
    }
    else {
        @matrix[$index] = [ split '', $_ ];
        ++$index;
    }
}
push @$matrices, [ @matrix ];          # the last one in the file

close $fh;

print "Spot check: \$matrices->[0][2][2]: $matrices->[0][2][2]\n";
dd($matrices);

这包含对数据的假设，通常它具有确切的预期格式。

请参阅有关参考资料perlreftut的教程以及有关数据结构的食谱perldsc。

另请参阅answer by xxfelixxx，其中所有内容的方式略有不同。

还有很多其他方法可以做到这一点。

Answer 2

将数据拆分为段落数组。
将每个段落拆分为一个行数组。
将每一行拆分为一个字符数组。

这可以通过以下任一方式实现：

my @arrays;
{
   local $/ = "";  # Paragraph mode
   @arrays = map { [ map { [ split // ] } split /\n/ ] } <>;
}

或

my @arrays;
{
   local $/ = "";  # Paragraph mode
   push @arrays, [ map { [ split // ] } split /\n/ ] while <>;
}

他们生产：

$VAR1 = [
          [
            [ 'A', 'B', 'C', 'D' ],
            [ 'E', 'F', 'G', 'H' ],
            [ 'I', 'J', 'K', 'L' ]
          ],
          [
            [ 'M', 'N', 'O', 'P' ],
            [ 'Q', 'R', 'S', 'T' ],
            [ 'U', 'V', 'W', 'X' ]
          ]
        ];

所以，

say $arrays[0][2][2];  # K
say $arrays[1][0][1];  # N

Answer 3

#!/usr/bin/env perl
use strict;
use warnings;

my $arrays = [];
my $count = 0;
my $row = 0;

# Read data and store in $arrays
while(<DATA>) {
    if (my ($letters) = m/^(\w+)\s*$/) {
        # Store letters
        $arrays->[$count]->[$row] = [ split //, $letters ];
        $row++;
    } else {
        # Next batch
        $count++;
        $row = 0;
    }
}

# Print it out with indices
for my $iarray ( 0 .. $count ) {
    print "------ Matrix $iarray ------\n";
    my @rows = @{ $arrays->[$iarray] };
    for my $irow ( 0 .. $#rows) {
        my @cols = @{ $rows[$irow] };
        for my $icol ( 0 .. $#cols ) {
            print "($irow,$icol) -> " . $cols[$icol] . "\n";
        }
    }
}

__DATA__
ABCD
EFGH
IJKL

MNOP
QRST
UVWX

<强>输出

------ Matrix 0 ------
(0,0) -> A
(0,1) -> B
(0,2) -> C
(0,3) -> D
(1,0) -> E
(1,1) -> F
(1,2) -> G
(1,3) -> H
(2,0) -> I
(2,1) -> J
(2,2) -> K
(2,3) -> L
------ Matrix 1 ------
(0,0) -> M
(0,1) -> N
(0,2) -> O
(0,3) -> P
(1,0) -> Q
(1,1) -> R
(1,2) -> S
(1,3) -> T
(2,0) -> U
(2,1) -> V
(2,2) -> W
(2,3) -> X

如何将文本文件拆分为两个数组？

3 个答案: