Question

我是Perl的新手并且对语法有疑问。我收到此代码用于解析包含特定信息的文件。我想知道子程序if (/DID/)的{{1}}部分在做什么？这是利用正则表达式吗？我不太确定，因为正则表达式匹配看起来像get_number。最后，$_ =~ /some expression/子程序中的while循环是否必要？

get_number

#!/usr/bin/env perl use Scalar::Util qw/ looks_like_number /; use WWW::Mechanize; # store the name of all the OCR file names in an array my @file_list=qw{ blah.txt }; # set the scalar index to zero my $file_index=0; # open the file titled 'outputfile.txt' and write to it # (or indicate that the file can't be opened) open(OUT_FILE, '>', 'outputfile.txt') or die "Can't open output file\n"; while($file_index < 1){ # open the OCR file and store it in the filehandle IN_FILE open(IN_FILE, '<', "$file_list[$file_index]") or die "Can't read source file!\n"; print "Processing file $file_list[$file_index]\n"; while(<IN_FILE>){ my $citing_pat=get_number(); get_country($citing_pat); } $file_index=$file_index+1; } close IN_FILE; close OUT_FILE;的定义如下。

get_number

Answer 1

Perl有一个variable $_，它是很多东西的默认转储基础。

在get_number中，while(<IN_FILE>){正在读取$_中的一行，下一行是检查$_是否与正则表达式DID匹配。

当没有给出任何参数时，chomp;也会$_运行，这也很常见。

Answer 2

在这种情况下，if (/DID/)默认搜索$_变量，因此它是正确的。然而，它是一个相当宽松的正则表达式，IMO。

sub中的while循环可能是必要的，它取决于你的输入是什么样的。您应该知道两个while循环将导致某些行被完全跳过。

主程序中的while循环将占用一行，并且不执行任何操作。基本上，这意味着文件中的第一行，以及直接跟随匹配行的每一行（例如，包含“DID”的行和第4个字段是数字的行）也将被丢弃。

为了正确回答这个问题，我们需要查看输入文件。

此代码存在许多问题，如果它按预期工作，可能是因为运气不错。

以下是代码的清理版本。我保留了模块，因为我不知道它们是否在其他地方使用过。我也保留了输出文件，因为它可能用在你没有显示的地方。此代码不会尝试使用get_country的未定义值，如果找不到合适的数字，则不会执行任何操作。

use warnings;
use strict;
use Scalar::Util qw/ looks_like_number /;
use WWW::Mechanize;

my @file_list=qw{ blah.txt };

open(my $outfile, '>', 'outputfile.txt') or die "Can't open output file: $!";

for my $file (@file_list) {
    open(my $in_file, '<', $file) or die "Can't read source file: $!";
    print "Processing file $file\n";
    while (my $citing_pat = get_number($in_file)) {
        get_country($citing_pat);
    }
}
close $out_file;

sub get_number {
    my $fh = shift;
     while(<$fh>) {
            if (/DID/) {
                    my $field = (split)[3];
                    if($field =~ /^\d+$/){
                return $field;
                    }
            }
     }
    return undef;
}

帮助perl代码解析文件

2 个答案: