Question

我正在尝试从以下文本文件中提取特定记录。我只能从文件中选择特定的记录。

输入文件：

Record 0:
[record
  InputData [record
              RecType            "001"
              MyData            [record
                                RefTable "001" 
                                RefTableDesc    "Metadata "]
              MyAdd             NULL
              MyType            NULL
              MyRole            NULL]]
Record 1:
[record
  InputData [record
              RecType            "001"
              MyData            [record
                                RefTable "002" 
                                RefTableDesc    "Metadata "]
              MyAdd             NULL
              MyType            NULL
              MyRole            NULL]]

Record 2:
[record
  InputData [record
              RecType            "002"
              MyData            NULL
              MAdd             [record
                                   MY_ADD_CD       "00 "
                                   MY_ADD_SHORT_NM "MY Specific"
                                   MY_ADD_NM       "My Specific Addendum"
                                   MY_ADD_TYPE_CD  "01 "]
              MyType           NULL
              MyRole           NULL]]
Record 3:
[record
  InputData [record
              RecType            "002"
              MyData            NULL
              MAdd             [record
                                   MY_ADD_CD       "001"
                                   MY_ADD_SHORT_NM "MY Specific"
                                   MY_ADD_NM       "My Specific Addendum"
                                   MY_ADD_TYPE_CD  "01 "]
              MyType           NULL
              MyRole           NULL]]

这是我的perl脚本：

#!/usr/bin/perl
use strict;
use warnings;

my $fn = shift || 'dump.txt';
my $word1 = shift || 'RecType';
my $word2 = shift || 'RefTable';
my $word3 = shift || 'RefTableDesc';
my $word4 = shift || 'MY_ADD_CD';
my $word5 = shift || 'MY_ADD_SHORT_NM';
my $word6 = shift || 'MY_ADD_NM';
my $word7 = shift || 'MY_ADD_TYPE_CD';

my @output;
open my $fh, '<', $fn or die "Could not open file '$fn': $!";

while (<$fh>) {
        if ($. = /\b$word1\b/i) {
   push @output, split;
}
elsif ($. = /\b$word2\b/i ){
        push @output, split;
}
elsif ($. = /\b$word3\b/i ){
        push @output, split;
}
elsif ($. = /\b$word4\b/i) {
   push @output, split;
}
elsif ($. = /\b$word5\b/i ){
        push @output, split;
}
elsif ($. = /\b$word6\b/i ){
        push @output, split;
}
elsif ($. = /\b$word7\b/i ){
        push @output, split;
    print "@output\n";
    @output = ();
         }
}
close ($fh);

以下是我得到的输出：

RecType "001" RefTable "001" RefTableDesc "Metadata " RecType "001" RefTable "002" RefTableDesc "Metadata " RecType "002" MY_ADD_CD "00 " MY_ADD_SHORT_NM "MY Specific" MY_ADD_NM "My Specific Addendum " MY_ADD_TYPE_CD "01 "
RecType "002" MY_ADD_CD "001" MY_ADD_SHORT_NM "MY Specific" MY_ADD_NM "My Specific Addendum " MY_ADD_TYPE_CD "01 "

期望的输出：

"001"  "001"  "Metadata " 
"001"  "002"  "Metadata " 
"002"  "00 "  "MY Specific"  "My Specific Addendum "  "01 "
"002"  "001"  "MY Specific"  "My Specific Addendum "  "01 "

请建议是否有办法实现它。

Answer 1

以下是可用于生成输出的记录的解析器：

#!/usr/bin/perl
use strict;
use warnings;

my $fn = shift || 'dump.txt';
open my $fh, '<', $fn or die "Could not open file '$fn': $!";

sub read_record { 
    my %record;
    my $end;
    while (<$fh>) {
        chomp;
        (my $key, my $value,$end) = /\s*(\w+)\s+([^\]]*)(\]*)\s*$/;
        $end = length($end);
        if ( $value && $value =~ /\[record/ ) { 
            ($record{$key}, $end) = read_record();
        } elsif ( $value =~ /"(.*?)\s*"/ ) { 
            $record{$key} = $1;
        } elsif ( $value =~ /NULL/ ) {
            $record{$key} = undef;
        }
        last if $end;
    }
    return wantarray ? (\%record, --$end) : \%record;
}

my @records;

while (<$fh>) {
    if ( /^Record (\d+):/ ) { 
        <$fh>; # toss the [record line
        $records[$1] = read_record();
    } 
}
close ($fh);

use Data::Dumper;
print Dumper \@records;

输出：

$VAR1 = [
          {
            'InputData' => {
                             'MyAdd' => undef,
                             'MyType' => undef,
                             'MyRole' => undef,
                             'MyData' => {
                                           'RefTable' => '001',
                                           'RefTableDesc' => 'Metadata'
                                         },
                             'RecType' => '001'
                           }
          },
          {
            'InputData' => {
                             'MyData' => {
                                           'RefTable' => '002',
                                           'RefTableDesc' => 'Metadata'
                                         },
                             'RecType' => '001',
                             'MyAdd' => undef,
                             'MyType' => undef,
                             'MyRole' => undef
                           }
          },
          {
            'InputData' => {
                             'RecType' => '002',
                             'MyData' => undef,
                             'MyRole' => undef,
                             'MyType' => undef,
                             'MAdd' => {
                                         'MY_ADD_SHORT_NM' => 'MY Specific',
                                         'MY_ADD_TYPE_CD' => '01',
                                         'MY_ADD_CD' => '00',
                                         'MY_ADD_NM' => 'My Specific Addendum'
                                       }
                           }
          },
          {
            'InputData' => {
                             'MyData' => undef,
                             'RecType' => '002',
                             'MyRole' => undef,
                             'MyType' => undef,
                             'MAdd' => {
                                         'MY_ADD_NM' => 'My Specific Addendum',
                                         'MY_ADD_CD' => '001',
                                         'MY_ADD_TYPE_CD' => '01',
                                         'MY_ADD_SHORT_NM' => 'MY Specific'
                                       }
                           }
          }
        ];

但是，如果您只想要输出并且不关心记录，那么问题就更简单了：

#!/usr/bin/perl
use strict;
use warnings;

my $fn = shift || 'dump.txt';
open my $fh, '<', $fn or die "Could not open file '$fn': $!";

while (<$fh>) {
    print "$1 " if /("[^"]*")/;
    print "\n" if /\]\]/;
}

close ($fh);

输出：

"001" "001" "Metadata " 
"001" "002" "Metadata " 
"002" "00 " "MY Specific" "My Specific Addendum" "01 " 
"002" "001" "MY Specific" "My Specific Addendum" "01 "

Answer 2

哦，伙计。 $.是文件中的当前行号。试试这个：

use strict; 
use warnings; 
use 5.016;
use Data::Dumper;

my $fname = shift || 'dump.txt';

open my $INFILE, '<', $fname 
    or die "Could not open file '$fname': $!";

while (my $line  = <$INFILE>) {
    say $.;
}

--output:--
1
2
3
...
...
45
46

来自perlvar：

您可以通过分配$来调整计数器。，但这不会实际上移动了搜索指针。

这是什么意思？究竟？我们试一试：

use strict; 
use warnings; 
use 5.016;
use Data::Dumper;

my $fname = shift || 'dump.txt';

open my $INFILE, '<', $fname 
    or die "Could not open file '$fname': $!";

while (my $line  = <$INFILE>) {
    say $.;

    if ($. == 1) {
        $. = 10;
    }
}

--output:--
1
11
12
13
...
...
54
55

因此，分配到$.只会更改$.计算的数字。

在您的代码中，您有一系列if / elsif语句，如下所示：

    if ($. = /\b$word1\b/i) {

在scalar context中，当您为标量变量（即名称以符号$开头的变量）分配内容时创建的上下文，match operator返回0没有匹配，如果匹配则为1。

因此，您的if语句有时会将0分配给$.：

if ($. = 0) {

有时你的if语句将1分配给$.：

if ($. = 1) {

这一切都很好，除非你在分配之后从不使用$.的值，因此它是一个无用的任务。您只需重复为$.分配新值，因为if / else分支执行。

由于您的代码不依赖于您分配给$.的值，因此您应将其删除：

if (/\b$word1\b/i)

接下来，if条件被认为是boolean context，即真/假上下文，布尔上下文是标量上下文（您只需要记住它）。所以现在你知道了：if条件是一个标量上下文。如上所述，标量上下文中的匹配运算符在匹配时返回0，如果没有匹配则返回1。结果，if语句：

if (/\b$word1\b/i)

......相当于：

if( 0 )  #when there is no match

...或：

if ( 1 ) #when there is a match

最后，在布尔上下文中，0被认为是假，1被认为是真。因此，当匹配时，执行if / else块;如果没有匹配，则跳过if / else块。

世界上有什么人将你的价值分配给$.？ perl有很多全局变量，你是如何选择$.的？而且，我想知道你为什么不写：

my $x;

if ($x = /\b$word1\b/i)

分配给$ x与分配给$.一样无用，但至少你并没有弄乱perl的全局变量。

下一个问题是：你的代码将所有数据转储到一个数组中，这意味着你不知道一个匹配的数据在哪里结束，另一个匹配的数据从哪里开始。

从文本文件中提取特定记录

2 个答案: