两个文件之间的模式匹配和处理

时间:2015-07-31 17:31:50

标签: regex perl

我试图找到在两个文件之间执行模式匹配和处理的所有不同方法。我是perl的新手,可以使用一些帮助。

我有两个档案。 Colors.txt看起来像这样:

Joe likes the color green.
Sam likes the color blue.

Pencils.txt看起来像这样:

Pencil one is blue.
Pencil two is green.

我需要解析这两个文件,并打印以下内容:

Sam's pencil is number one because he likes the color blue.
Joe's pencil is number two because he likes the color green.

有人可以指导我如何有效地处理这个问题吗?

2 个答案:

答案 0 :(得分:0)

从其中一个文件(任一个)创建一个查找表,然后在处理另一个文件时查找所需的值。

my %pencil_by_color;

while (<$pencils_fh>) {
   my ($name, $color) = /^Pencil (\S+) is (\S+)\.$/
      or die("Syntax");

   !$pencil_by_color{$color}
      or die("Duplicate");

   $pencil_by_color{$color} = $name;
}

while (<$colors_fh>) {
   my ($name, $color) = /^(\S+) likes the color (\S+)\.$/
      or die("Syntax");

   my $pencil = $pencil_by_color{$color}
      or die("Missing");

   print("${name}'s pencil is number ${pencil} because he likes the color ${color}.\n");
}

答案 1 :(得分:0)

#!/usr/bin/perl

my $filecol = $ARGV[0]; # path to colors.txt
my $filepen = $ARGV[1]; # path to pencils.txt

die("USAGE: ./colpen.pl colors.txt pencils.txt") unless -f $filecol && -f $filepen;

my $colors; # hashref of numbers strings for each color string (e.g. $colors->{green} = "two")
my $l = ""; # line by line reading buffer

open(PEN, "<$filepen") or die("Cannot open \"$filepen\" : $!");
while(defined($l=<PEN>)){
    if( $l =~ /(\w+)\s+is\s+(\w+)/i ){ # matches every line that has "... Foo is Bar" while putting "Foo" and "Bar" in $1 and $2 respectively

        $colors->{$2} = $1;

    }
}
close(PEN);

open(COL, "<$filecol") or die("Cannot open \"$filecol\" : $!");
while(defined($l=<COL>)){
    if( $l =~ /^\s*(\w+)\s.*\scolor\s+(\w+)/i ){ # match in a flexible way a first word, then the next word after "color", and put them in $1 and $2

        if( $colors->{$2} ne "" ){
            print( sname($1)." pencil is number $colors->{$2} because he likes the color $2.\n");
        }else{
            print("I have no idea which pencil is ".sname($1).".\n");
        }

    }
}
close(COL);

exit;

sub sname {
    # Just an extra :) for names that end with an s.

    return($_[0]) if $_[0] =~ /s$/i;
    return("$_[0]\'s");

}