在Perl中输入多个.txt文件

时间:2017-02-26 21:41:36

标签: html perl cgi

我有一个在线Perl concordance,用于搜索特定文本文件中的目标词并打印已排序的输出。目前,测试代码仅在单个文本文件中搜索关键字并打印输出。但我想对文件夹中存在的所有文本文件执行相同操作,而不仅仅是单个文本文件。任何关于此的建议都会非常有用。!

以下是我在线Concordance的代码:

#!/usr/bin/perl -wT

# require
use strict;
use diagnostics;
use CGI;

# sanity check
my $q = new CGI;
my $target = $q->param("keyword");
my $radius = $q->param("span");
my $ordinal = $q->param("ord");
my $width = 2*$radius;
my $file    = 'DISS.G.HB.002.txt';
if ( ! $file or ! $target ) {

    print "Usage: $0 <file> <target>\n";
    exit;

}

# initialize
my $count   = 0;
my @lines   = ();
$/          = ""; # Paragraph read mode

# open the file, and process each line in it
open(FILE, " < $file") or die("Can not open $file ($!).\n");
while(<FILE>){

    # re-initialize
    my $extract = '';

    # normalize the data
    chomp;
    s/\n/ /g;        # Replace new lines with spaces
    s/\b--\b/ -- /g; # Add spaces around dashes

    # process each item if the target is found
    while ( $_ =~ /\b$target\w*/gi ){

        # find start position
        my $match = $1;
        my $pos   = pos;
        my $start = $pos - $radius - length($match);

        # extract the snippets
        if ($start < 0){
            $extract = substr($_, 0, $width+$start+length($match));
            $extract = (" " x -$start) . $extract;
        }else{
            $extract = substr($_, $start, $width+length($match));
            my $deficit = $width+length($match) - length($extract);
            if ($deficit > 0) {
                $extract .= (" " x $deficit);
            }

        }

        # add the extracted text to the list of lines, and increment
        $lines[$count] = $extract;
        ++$count;

    }

}

sub removePunctuation {
    my $string = $_[0];
    $string = lc($string); # Convert to lowercase
    $string =~ s/[^-a-z ]//g; # Remove non-aplhabetic characters 
    $string =~ s/--+/ /g; #Remove 2+ hyphens with a space 
    $string =~s/-//g; # Remove hyphens
    $string =~ s/\s=/ /g;
    return($string);

}

sub onLeft {
    #USAGE: $word = onLeft($string, $radius, $ordinal);
    my $left = substr($_[0], 0, $_[1]);
    $left = removePunctuation($left);
    my @word = split(/\s+/, $left);
    return($word[-$_[2]]);
}

sub byLeftWords {
    my $left_a = onLeft($a, $radius, $ordinal);
    my $left_b = onLeft($b, $radius, $ordinal);
    lc($left_a) cmp lc($left_b);
}


# process each line in the list of lines

print "Content-type: text/plain\n\n";
my $line_number = 0;

foreach my $x (sort byLeftWords @lines){
    ++$line_number;
    printf "%5d",$line_number;
    print " $x\n\n";
}

# done
exit;

1 个答案:

答案 0 :(得分:1)

glob()函数将返回与模式匹配的文件列表。

my @text_files = glob('*.txt');

当然,您可能不需要中间@text_files变量。

while (my $file = glob('*.txt')) {
  open my $fh, '<', $file or die "$file: $!";
  # do something with the filehandle
}

您的代码的其他一些建议。

    Perl 5.6于2000年发布时,
  • -w在很大程度上被use warnings取代。
  • new CGI写得更好CGI->new
  • 应始终对特殊变量(如$/)的更改进行本地化。
  • 请使用词法文件句柄和open()的三个arg版本(如上例所示)。
  • 如果你正在使用CGI.pm,那么为什么不使用它的header()方法?

但是,最重要的是,请重新考虑您对CGI的使用。请阅读CGI::Alternatives以获得更好的建议(我的意思是更简单,更强大)。