使用Perl解析文本文件

时间:2014-03-19 05:30:11

标签: regex perl parsing text-files

我有一个看起来像这样的文本文件。

    Parameter 0:
    Field 1           : 100
    Field 2           : 0
    Field 3           : 4

    Parameter 1:
    Field 1           : 873
    Field 2           : 23
    Field 3           : 89

我想编写一个以下列格式解析此文件的perl脚本

     Parameter Field1 Field2 Field3
       0          100     0      4
       1          873     23     89

任何人都可以帮助我。任何帮助将不胜感激。 我到目前为止尝试了以下

my %hash = ();
my $file = "sample.txt";

open (my $fh, "<", $file) or die "Can't open the file $file: ";

while (my $line =<$fh>)
{
    chomp ($line);
    my($key) = split(" : ", $line);
    $hash{$key} = 1;
}

foreach my $key (sort keys %hash)
{
    print "$key\n";
}

4 个答案:

答案 0 :(得分:2)

这个Perl程序可以满足您的要求。它允许每个参数的任意数量的字段(尽管每个参数必须有相同数量的字段),并从数据本身获取字段的标题标签。

use strict;
use warnings;

my $file = 'sample.txt';

open my $fh, '<', $file or die qq{Can't open "$file" for input: $!};

my %data;
my @params;
my @fields;

while (<$fh>) {
  next unless /\S/;
  chomp;

  my ($key, $val) = split /\s*:\s*/;
  if ($val =~ /\S/) {
    push @fields, $key if @params == 1;
    push @{ $data{$params[-1]} }, $val if @params;
  }
  else {
    die qq{Unexpected parameter format "$key"} unless $key =~ /parameter\s+(\d+)/i;
    push @params, $1;
  }
}

my @headers = ('Parameter', @fields);
my @widths = map length, @headers;
my $format = join(' ', map "%${_}s", @widths) . "\n";

printf $format, @headers;
for my $param (@params) {
  printf $format, $param, @{ $data{$param} };
}

<强>输出

Parameter Field 1 Field 2 Field 3
        0     100       0       4
        1     873      23      89

答案 1 :(得分:0)

use warnings; use strict;

my $file = "sample.txt";
open (my $fh, "<", $file) or die "Can't open the file $file: ";

print "Parameter Field1 Field2 Field3\n";

while (my $line=<$fh>) {

  process_parameter($1) if $line =~ /Parameter (\d+):/;

}

sub process_parameter {

  my $parameter = shift;

  my ($field_1) = (<$fh> =~ /(\d+) *$/);
  my ($field_2) = (<$fh> =~ /(\d+) *$/);
  my ($field_3) = (<$fh> =~ /(\d+) *$/);

  printf "  %-2d         %-6d  %-6d %-6d\n", $parameter, $field_1, $field_2, $field_3;
}

答案 2 :(得分:0)

#!/usr/bin/perl

my %hash = ();
my %fields;

my $param;

while ( chomp( my $line = <DATA> ) ) {
    if ( $line =~ /Parameter (\d+):/ ) {
        $param = $1;
    }
    next unless ( defined $param );

    if ( my ( $f, $v ) = $line =~ /(Field \d+)[\s\t]*: (\d+)/ ) {
        $hash{$param} ||= {};

        $hash{$param}->{$f} = $v;

        $fields{$f} ||= 1;
    }

}

my @fields = sort keys %fields;
print join( ',', 'Parameter', @fields ), "\n";

foreach my $param ( sort { $a <=> $b } keys %hash ) {
    print join( ',', $param, @{ $hash{$param} }{@fields} ), "\n";
}

__DATA__
Parameter 0:
Field 1           : 100
Field 2           : 0
Field 3           : 4

Parameter 1:
Field 1           : 873
Field 2           : 23
Field 3           : 89

答案 3 :(得分:0)

这是一种为每个参数接受任意数量字段的方法:

my $par;
my %out;
my $max = 0;
while(<DATA>) {
    chomp;
    next if /^\s*$/;
    if (/Parameter\s*(\d+)/) {
        $par = $1;
        next;
    }
    my ($k, $v) = $_ =~/Field\s+(\d+)\s*:\s*(\d+)/;
    $out{$par}[$k] = $v;
    $max = $k if $k > $max;
}
my $cols = 'Param';
$cols .= "\tField $_" for (1..$max);
say $cols;
foreach my $par(sort (keys %out))  {
    my $out = $par;
    $out .= "\t".($out{$par}[$_]//' ') for (1..$max);
    say $out;
}

__DATA__
Parameter 0:
    Field 1           : 100
    Field 2           : 0
    Field 3           : 4
    Field 5 :18

    Parameter 1:
    Field 1           : 873
    Field 2           : 23
    Field 3           : 89
    Field 4     : 123

<强>输出:

Param   Field 1 Field 2 Field 3 Field 4 Field 5
0       100     0       4               18
1       873     23      89      123