Question

我有两个文本文件。

第一个有一个单词列表，如下所示：

文件1.txt

Laura
Samuel
Gerry
Peter
Maggie

第二个有段落。例如

FILE2.TXT

Laura
is
about
to
meet
Gerry
and
is
planning
to
take
Peter
along

我希望程序执行的操作是查找常用字词并在MATCH或第三个输出文件中的匹配字旁边打印File2.txt。

所以期望的输出应该是这样的。

Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along

我尝试过以下代码，但是我没有得到所需的输出。

use warnings;
use strict;

use Data::Dumper;

my $result = { };

my $first_file  = shift || 'File1.txt';
my $second_file = shift || 'File2.txt';
my $output      = 'output2.txt';

open my $a_fh, '<', $first_file  or die "$first_file: $!";
open my $b_fh, '<', $second_file or die "$second_file: $!";

open( OUTPUT,  '>' . $output ) or die "Cannot create $output.\n";

while ( <$a_fh> ) {
    chomp;
    next if /^$/;
    $result->{$_}++;
}

while ( <$b_fh> ) {

    chomp;

    next if /^$/;

    if ( $result->{$_} ) {
        delete $result->{$_};
        $result->{ join " |" => $_, "MATCH" }++;
    }
    else {
        $result->{$_}++;
    }
}

{
    $Data::Dumper::Sortkeys = 0;
    print OUTPUT Dumper $result;
}

但我得到的输出是这样的。

Laura  | MATCH
Samuel | MATCH
take
Maggie | MATCH
Laura
about
to
Gerry
meet
Gerry | MATCH
and
is
Maggie |MATCH
planning
to
Peter |MATCH
take
Peter |MATCH

输出不是段落格式，也不是为所有匹配打印MATCH。

请告知。

Answer 1

这是一个允许执行多个文件的示例。我用我想要比较的文件填充数组@files，然后我读入wordlist文件并将它们全部放入哈希值，然后一次迭代段落文件一行。然后我将每行上的所有单词分开并打印出来，但只有在检查单词是否在单词列表中之后。如果是的话，我用＆＃34; | MATCH＆＃34;

段落文件1：

Laura is about to meet Gerry, and is planning to take Peter along.

But Peter and Sarah have other plans.

段落文件2：

Blah Peter has lost it.

代码：

use warnings;
use strict;

my @files = ('file.txt', 'file2.txt');

open my $word_fh, '<', 'wordlist.txt' or die $!;

my %words_to_match = map {chomp $_; $_ => 0} <$word_fh>;

close $word_fh;

check($_) for @files;

sub check {
    my $file = shift;

    open my $fh, '<', $file or die $!;

    while (<$fh>){
        chomp;
        my @words_in_line = split;

        for my $word (@words_in_line){
            $word =~ s/[\.,;:!]//g;
            $word .= ' | MATCH' if exists $words_to_match{$word};
            print "    $word\n";
        }
        print "\n";
    }
}

输出：

Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
But
Peter | MATCH
and
Sarah
have
other
plans

Blah
Peter | MATCH
has
lost
it

如果要将其打印到文件，请打开写文件句柄，并将while循环中的print语句更改为print $wfh ...。

Answer 2

我认为您没有获得所需的输出，因为您将其填入哈希$result然后使用Data::Dumper打印出来。 Data::Dumper会以任意顺序打印哈希值，尤其是在设置$Data::Dumper::Sortkeys=0。

时

我稍微改变了你的代码，所以一旦从File2.txt中读取输出，就会立即写出输出，并清楚是否有匹配。

#!/usr/bin/env perl

use strict;
use warnings;

my $result      = {};
my $first_file  = shift || 'File1.txt';
my $second_file = shift || 'File2.txt';
my $output      = 'output2.txt';
open my $a_fh, '<', $first_file  or die "$first_file: $!";
open my $b_fh, '<', $second_file or die "$second_file: $!";
open( my $out, '>', $output ) or die "Cannot create $output.\n";

# remember words from File1.txt in the hash $result:
while ( my $line = <$a_fh> ) {
    $line =~ s/^\s*//;    # strip leading whitespace
    $line =~ s/\s*$//;    # strip trailing ws
    next if $line =~ /^$/;    # skip now empty lines
    $result->{$line} = 1;
}

# now $result consists of all "words" in File1.txt, like
# $result = {
#   'Gerry'  => 1,
#   'Laura'  => 1,
#   'Maggie' => 1,
#   'Peter'  => 1,
#   'Samuel' => 1
# };

# now iterate over File2.txt, print all lines and append
# 'MATCH' for those in File1.txt:
while ( my $line = <$b_fh> ) {
    $line =~ s/^\s*//;        # strip leading whitespace
    $line =~ s/\s*$//;        # strip trailing ws
    next if $line =~ /^$/;    # skip now empty lines

    # print the line from File2.txt (without \n):
    print $out $line;

    # if this line (word) was found
    # in File1.txt, then append " | MATCH"
    if ( $result->{$line} ) {
        print $out ' | MATCH';
    }

    # print final \n
    print $out "\n";
}

输出：

Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along

用于在段落

文件1.txt

FILE2.TXT

2 个答案: