我有两个文本文件。
第一个有一个单词列表,如下所示:
Laura
Samuel
Gerry
Peter
Maggie
第二个有段落。例如
Laura
is
about
to
meet
Gerry
and
is
planning
to
take
Peter
along
我希望程序执行的操作是查找常用字词并在MATCH
或第三个输出文件中的匹配字旁边打印File2.txt
。
所以期望的输出应该是这样的。
Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
我尝试过以下代码,但是我没有得到所需的输出。
use warnings;
use strict;
use Data::Dumper;
my $result = { };
my $first_file = shift || 'File1.txt';
my $second_file = shift || 'File2.txt';
my $output = 'output2.txt';
open my $a_fh, '<', $first_file or die "$first_file: $!";
open my $b_fh, '<', $second_file or die "$second_file: $!";
open( OUTPUT, '>' . $output ) or die "Cannot create $output.\n";
while ( <$a_fh> ) {
chomp;
next if /^$/;
$result->{$_}++;
}
while ( <$b_fh> ) {
chomp;
next if /^$/;
if ( $result->{$_} ) {
delete $result->{$_};
$result->{ join " |" => $_, "MATCH" }++;
}
else {
$result->{$_}++;
}
}
{
$Data::Dumper::Sortkeys = 0;
print OUTPUT Dumper $result;
}
但我得到的输出是这样的。
Laura | MATCH
Samuel | MATCH
take
Maggie | MATCH
Laura
about
to
Gerry
meet
Gerry | MATCH
and
is
Maggie |MATCH
planning
to
Peter |MATCH
take
Peter |MATCH
输出不是段落格式,也不是为所有匹配打印MATCH
。
请告知。
答案 0 :(得分:1)
这是一个允许执行多个文件的示例。我用我想要比较的文件填充数组@files
,然后我读入wordlist文件并将它们全部放入哈希值,然后一次迭代段落文件一行。然后我将每行上的所有单词分开并打印出来,但只有在检查单词是否在单词列表中之后。如果是的话,我用&#34; | MATCH&#34;
段落文件1:
Laura is about to meet Gerry, and is planning to take Peter along.
But Peter and Sarah have other plans.
段落文件2:
Blah Peter has lost it.
代码:
use warnings;
use strict;
my @files = ('file.txt', 'file2.txt');
open my $word_fh, '<', 'wordlist.txt' or die $!;
my %words_to_match = map {chomp $_; $_ => 0} <$word_fh>;
close $word_fh;
check($_) for @files;
sub check {
my $file = shift;
open my $fh, '<', $file or die $!;
while (<$fh>){
chomp;
my @words_in_line = split;
for my $word (@words_in_line){
$word =~ s/[\.,;:!]//g;
$word .= ' | MATCH' if exists $words_to_match{$word};
print " $word\n";
}
print "\n";
}
}
输出:
Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
But
Peter | MATCH
and
Sarah
have
other
plans
Blah
Peter | MATCH
has
lost
it
如果要将其打印到文件,请打开写文件句柄,并将while循环中的print
语句更改为print $wfh ...
。
答案 1 :(得分:0)
我认为您没有获得所需的输出,因为您将其填入哈希$result
然后使用Data::Dumper
打印出来。
Data::Dumper
会以任意顺序打印哈希值,尤其是在设置$Data::Dumper::Sortkeys=0
。
我稍微改变了你的代码,所以一旦从File2.txt中读取输出,就会立即写出输出,并清楚是否有匹配。
#!/usr/bin/env perl
use strict;
use warnings;
my $result = {};
my $first_file = shift || 'File1.txt';
my $second_file = shift || 'File2.txt';
my $output = 'output2.txt';
open my $a_fh, '<', $first_file or die "$first_file: $!";
open my $b_fh, '<', $second_file or die "$second_file: $!";
open( my $out, '>', $output ) or die "Cannot create $output.\n";
# remember words from File1.txt in the hash $result:
while ( my $line = <$a_fh> ) {
$line =~ s/^\s*//; # strip leading whitespace
$line =~ s/\s*$//; # strip trailing ws
next if $line =~ /^$/; # skip now empty lines
$result->{$line} = 1;
}
# now $result consists of all "words" in File1.txt, like
# $result = {
# 'Gerry' => 1,
# 'Laura' => 1,
# 'Maggie' => 1,
# 'Peter' => 1,
# 'Samuel' => 1
# };
# now iterate over File2.txt, print all lines and append
# 'MATCH' for those in File1.txt:
while ( my $line = <$b_fh> ) {
$line =~ s/^\s*//; # strip leading whitespace
$line =~ s/\s*$//; # strip trailing ws
next if $line =~ /^$/; # skip now empty lines
# print the line from File2.txt (without \n):
print $out $line;
# if this line (word) was found
# in File1.txt, then append " | MATCH"
if ( $result->{$line} ) {
print $out ' | MATCH';
}
# print final \n
print $out "\n";
}
输出:
Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along