Question

我编写了一个Perl脚本，它有两个输入文件：

第一个文件包含每个行短语，然后是括号中的值。这是一个例子：
```
hello all (0.5)
hi all (0.63)
good bye all (0.09)
```

第二个文件包含规则列表。例如：

hello all -> salut (0.5)
hello all -> salut à tous (0.5)
hi all -> salut (0.63)
good bye all -> au revoir (0.09)
good bye -> au revoir  (0.09)

脚本必须读取第二个文件，并且对于每一行，它在箭头前提取短语（例如，对于第一行：hello all），它将检查第一个文件中是否存在该短语（在我们的例子在这里被发现。）

如果存在，则将整行hello all -> salut (0.5)写入输出。所以在这个例子中输出文件应该是：

hello all -> salut (0.5)
hello all -> salut à tous (0.5)
hi all -> > salut (0.63)
good bye all -> au revoir (0.09)

我的想法是将第一个文件的所有内容放入哈希表中。为此我的脚本：

#!/usr/bin/perl

use warnings;

my $vocabFile = "file1.txt";
my %hashFR =();
open my $fh_infile, '<', $InFile or die "Can't open $InFile\n";

while ( my $Ligne = <$fh_infile> ) {
  if ( $Ligne =~ /(/ ) {
    my ($cle, $valeur) = split /(/, $Ligne;
    say $cle; 
    $h{$cle}  = $valeur;
  }     
}

我现在的问题是：如何在箭头之前提取单词段并在哈希表中搜索它？

感谢您的帮助

Answer 1

您需要use strict。这会导致您的程序在遇到未声明的变量（例如$InFile时失败（我假设您打算使用$vocabFile）。我将在您发布的代码中忽略这些类型的问题，因为您可以在严格打开后自行修复它们。

首先，您现有代码存在一些逻辑问题。你似乎并没有使用你存储为哈希值的括号中的数字，但如果你确实想要使用它们，你应该摆脱尾随) ：

    my ($cle, $valeur) = split /[()]/, $Ligne;

接下来，在使用字符串作为哈希键之前去掉前导和尾随空格。您可能认为"foo"和"foo "是同一个词，但Perl不是。

$cle =~ s/^\s+//;
$cle =~ s/\s+$//;

现在，你已经大部分时间了。您显然已经知道如何读取文件，如何使用split以及如何使用哈希。你只需将这些全部放在一起。阅读第二个文件：

open my $fh2, "<", "file2" or die "Can't open file2: $!";

while (<$fh2>) {
    chomp;

...在->

之前获取部分

    my ($left, $right) = split /->/;

...从键中删除前导和尾随空格

    $left =~ s/^\s+//;
    $left =~ s/\s+$//;

...如果密钥存在于哈希

中，则打印出整行

    print $_, "\n" if exists $hash{$left};

...完成后不要忘记关闭文件句柄

close $fh2;

（虽然正如amon指出的那样，这并不是绝对必要的，特别是因为我们正在阅读而不是写作。有一个很好的PerlMonks thread处理这个话题。）

Answer 2

#!/usr/bin/perl

use strict; use warnings;
use Data::Dumper;

open my $FILE_1, '<', shift @ARGV;
open my $FILE_2, '<', shift @ARGV;

my @file1 = <$FILE_1>;
my @file2= <$FILE_2>;

close $FILE_1;
close $FILE_2;
# Store "segments" from the first file in hash:
my %first_file_hash = map { chomp $_; my ($a) = $_ =~ /^(.*?)\s*\(/; $a => 1 } @file1;

my @result;
# Process file2 content:
foreach my $line (@file2) {
    chomp $line;
    # Retrieve "segment" from the line:
    my ($string) = $line =~ /^(.*?)\s+->/;
    # If it is present in file1, store it for future usage:
    if ($string and $first_file_hash{ $string }) {
        push @result, $line;
    }
}

open my $F, '>', 'output.txt';
print $F join("\n", @result);
close $F;

print "\nDone!\n";

运行方式：

perl script.pl file1.txt file2.txt

干杯！

Answer 3

这可以通过直接从第一个文件的内容创建哈希，然后读取第二个文件的每一行，检查哈希以查看是否应该打印来完成。

use strict;
use warnings;
use autodie;

my %permitted = do {
  open my $fh, '<', 'f1.txt';
  map { /(.+?)\s+\(/, 1 } <$fh>;
};

open my $fh, '<', 'f2.txt';
while (<$fh>) {
  my ($phrase) = /(.+?)\s+->/;
  print if $permitted{$phrase};
}

<强>输出

hello all -> salut (0.5)
hello all -> salut à tous (0.5)
hi all -> salut (0.63)
good bye all -> au revoir (0.09)

在哈希表中搜索的Perl脚本

3 个答案: