从文本文件中提取特定记录

时间:2015-01-23 08:04:55

标签: perl

我正在尝试从以下文本文件中提取特定记录。我只能从文件中选择特定的记录。

输入文件:

Record 0:
[record
  InputData [record
              RecType            "001"
              MyData            [record
                                RefTable "001" 
                                RefTableDesc    "Metadata "]
              MyAdd             NULL
              MyType            NULL
              MyRole            NULL]]
Record 1:
[record
  InputData [record
              RecType            "001"
              MyData            [record
                                RefTable "002" 
                                RefTableDesc    "Metadata "]
              MyAdd             NULL
              MyType            NULL
              MyRole            NULL]]

Record 2:
[record
  InputData [record
              RecType            "002"
              MyData            NULL
              MAdd             [record
                                   MY_ADD_CD       "00 "
                                   MY_ADD_SHORT_NM "MY Specific"
                                   MY_ADD_NM       "My Specific Addendum"
                                   MY_ADD_TYPE_CD  "01 "]
              MyType           NULL
              MyRole           NULL]]
Record 3:
[record
  InputData [record
              RecType            "002"
              MyData            NULL
              MAdd             [record
                                   MY_ADD_CD       "001"
                                   MY_ADD_SHORT_NM "MY Specific"
                                   MY_ADD_NM       "My Specific Addendum"
                                   MY_ADD_TYPE_CD  "01 "]
              MyType           NULL
              MyRole           NULL]]

这是我的perl脚本:

#!/usr/bin/perl
use strict;
use warnings;

my $fn = shift || 'dump.txt';
my $word1 = shift || 'RecType';
my $word2 = shift || 'RefTable';
my $word3 = shift || 'RefTableDesc';
my $word4 = shift || 'MY_ADD_CD';
my $word5 = shift || 'MY_ADD_SHORT_NM';
my $word6 = shift || 'MY_ADD_NM';
my $word7 = shift || 'MY_ADD_TYPE_CD';

my @output;
open my $fh, '<', $fn or die "Could not open file '$fn': $!";

while (<$fh>) {
        if ($. = /\b$word1\b/i) {
   push @output, split;
}
elsif ($. = /\b$word2\b/i ){
        push @output, split;
}
elsif ($. = /\b$word3\b/i ){
        push @output, split;
}
elsif ($. = /\b$word4\b/i) {
   push @output, split;
}
elsif ($. = /\b$word5\b/i ){
        push @output, split;
}
elsif ($. = /\b$word6\b/i ){
        push @output, split;
}
elsif ($. = /\b$word7\b/i ){
        push @output, split;
    print "@output\n";
    @output = ();
         }
}
close ($fh);

以下是我得到的输出:

RecType "001" RefTable "001" RefTableDesc "Metadata " RecType "001" RefTable "002" RefTableDesc "Metadata " RecType "002" MY_ADD_CD "00 " MY_ADD_SHORT_NM "MY Specific" MY_ADD_NM "My Specific Addendum " MY_ADD_TYPE_CD "01 "
RecType "002" MY_ADD_CD "001" MY_ADD_SHORT_NM "MY Specific" MY_ADD_NM "My Specific Addendum " MY_ADD_TYPE_CD "01 "

期望的输出:

"001"  "001"  "Metadata " 
"001"  "002"  "Metadata " 
"002"  "00 "  "MY Specific"  "My Specific Addendum "  "01 "
"002"  "001"  "MY Specific"  "My Specific Addendum "  "01 "

请建议是否有办法实现它。

2 个答案:

答案 0 :(得分:1)

以下是可用于生成输出的记录的解析器:

#!/usr/bin/perl
use strict;
use warnings;

my $fn = shift || 'dump.txt';
open my $fh, '<', $fn or die "Could not open file '$fn': $!";

sub read_record { 
    my %record;
    my $end;
    while (<$fh>) {
        chomp;
        (my $key, my $value,$end) = /\s*(\w+)\s+([^\]]*)(\]*)\s*$/;
        $end = length($end);
        if ( $value && $value =~ /\[record/ ) { 
            ($record{$key}, $end) = read_record();
        } elsif ( $value =~ /"(.*?)\s*"/ ) { 
            $record{$key} = $1;
        } elsif ( $value =~ /NULL/ ) {
            $record{$key} = undef;
        }
        last if $end;
    }
    return wantarray ? (\%record, --$end) : \%record;
}

my @records;

while (<$fh>) {
    if ( /^Record (\d+):/ ) { 
        <$fh>; # toss the [record line
        $records[$1] = read_record();
    } 
}
close ($fh);

use Data::Dumper;
print Dumper \@records;

输出:

$VAR1 = [
          {
            'InputData' => {
                             'MyAdd' => undef,
                             'MyType' => undef,
                             'MyRole' => undef,
                             'MyData' => {
                                           'RefTable' => '001',
                                           'RefTableDesc' => 'Metadata'
                                         },
                             'RecType' => '001'
                           }
          },
          {
            'InputData' => {
                             'MyData' => {
                                           'RefTable' => '002',
                                           'RefTableDesc' => 'Metadata'
                                         },
                             'RecType' => '001',
                             'MyAdd' => undef,
                             'MyType' => undef,
                             'MyRole' => undef
                           }
          },
          {
            'InputData' => {
                             'RecType' => '002',
                             'MyData' => undef,
                             'MyRole' => undef,
                             'MyType' => undef,
                             'MAdd' => {
                                         'MY_ADD_SHORT_NM' => 'MY Specific',
                                         'MY_ADD_TYPE_CD' => '01',
                                         'MY_ADD_CD' => '00',
                                         'MY_ADD_NM' => 'My Specific Addendum'
                                       }
                           }
          },
          {
            'InputData' => {
                             'MyData' => undef,
                             'RecType' => '002',
                             'MyRole' => undef,
                             'MyType' => undef,
                             'MAdd' => {
                                         'MY_ADD_NM' => 'My Specific Addendum',
                                         'MY_ADD_CD' => '001',
                                         'MY_ADD_TYPE_CD' => '01',
                                         'MY_ADD_SHORT_NM' => 'MY Specific'
                                       }
                           }
          }
        ];

但是,如果您只想要输出并且不关心记录,那么问题就更简单了:

#!/usr/bin/perl
use strict;
use warnings;

my $fn = shift || 'dump.txt';
open my $fh, '<', $fn or die "Could not open file '$fn': $!";

while (<$fh>) {
    print "$1 " if /("[^"]*")/;
    print "\n" if /\]\]/;
}

close ($fh);

输出:

"001" "001" "Metadata " 
"001" "002" "Metadata " 
"002" "00 " "MY Specific" "My Specific Addendum" "01 " 
"002" "001" "MY Specific" "My Specific Addendum" "01 " 

答案 1 :(得分:0)

哦,伙计。 $.是文件中的当前行号。试试这个:

use strict; 
use warnings; 
use 5.016;
use Data::Dumper;

my $fname = shift || 'dump.txt';

open my $INFILE, '<', $fname 
    or die "Could not open file '$fname': $!";

while (my $line  = <$INFILE>) {
    say $.;
}

--output:--
1
2
3
...
...
45
46

来自perlvar:

  

您可以通过分配$来调整计数器。 ,但这不会   实际上移动了搜索指针。

这是什么意思?究竟?我们试一试:

use strict; 
use warnings; 
use 5.016;
use Data::Dumper;

my $fname = shift || 'dump.txt';

open my $INFILE, '<', $fname 
    or die "Could not open file '$fname': $!";

while (my $line  = <$INFILE>) {
    say $.;

    if ($. == 1) {
        $. = 10;
    }
}

--output:--
1
11
12
13
...
...
54
55

因此,分配到$.只会更改$.计算的数字。

在您的代码中,您有一系列if / elsif语句,如下所示:

    if ($. = /\b$word1\b/i) {

scalar context中,当您为标量变量(即名称以符号$开头的变量)分配内容时创建的上下文,match operator返回0没有匹配,如果匹配则为1

因此,您的if语句有时会将0分配给$.

if ($. = 0) {

有时你的if语句将1分配给$.

if ($. = 1) {

这一切都很好,除非你在分配之后从不使用$.的值,因此它是一个无用的任务。您只需重复为$.分配新值,因为if / else分支执行。

由于您的代码不依赖于您分配给$.的值,因此您应将其删除:

if (/\b$word1\b/i)

接下来,if条件被认为是boolean context,即真/假上下文,布尔上下文是标量上下文(您只需要记住它)。所以现在你知道了:if条件是一个标量上下文。如上所述,标量上下文中的匹配运算符在匹配时返回0,如果没有匹配则返回1。结果,if语句:

if (/\b$word1\b/i)

......相当于:

if( 0 )  #when there is no match

...或:

if ( 1 ) #when there is a match

最后,在布尔上下文中,0被认为是假,1被认为是真。因此,当匹配时,执行if / else块;如果没有匹配,则跳过if / else块。

世界上有什么人将你的价值分配给$.? perl有很多全局变量,你是如何选择$.的?而且,我想知道你为什么不写:

my $x;

if ($x = /\b$word1\b/i) 

分配给$ x与分配给$.一样无用,但至少你并没有弄乱perl的全局变量。

下一个问题是:你的代码将所有数据转储到一个数组中,这意味着你不知道一个匹配的数据在哪里结束,另一个匹配的数据从哪里开始。