Question

我想从两个文本文件创建一个“复杂”的数据结构。

第一个文件有英文句子，法语句子和每行的分数。例如

The cat eats the mouse # Le chat mange la souris # 2.8

使用以下代码填充文件中的哈希，我没有遇到任何问题。关键是英语句子。

my $hash = { };

while ( my $text = <TEXT> ) {
  if ( $text =~ /(^(.+)#(.+)#(.+)$)/ ) {

    my $english_sentence = $2;
    my $french_sentence  = $3;
    my $score            = $4;

    $hash->{$english_sentence}->{translation} = $french_sentence;
    $hash->{$english_sentence}->{score}       = $score;
    my @words_en = split(/ /, $english_sentence);
    $hash->{$english_sentence}->{tokens_en} = \@words_en;

  }
  else {
    print "No sentences find in the input document" . "\n";
  }
}

结果哈希看起来像这样

$VAR1 = {
  'The cat eats the mouse' => {
    'words_en'    => ['The', 'cat', 'eats', 'the', 'mouse'],
    'score'       => '2.8',
    'translation' => 'Le chat mange la souris'
  }
};

第二个文件包含英文单词的翻译

   cat ||| chat ||| 0.600000 
   cat ||| félin ||| 0.500000
   eats ||| mange ||| 0.500000
   eats ||| manger ||| 0.500000
   mouse ||| souris ||| 0.600000 
   mouse ||| rat ||| 0.500000

结果哈希看起来像这样：

$VAR1 = {
  'eats' => {
      'manger' => '0.500000',
      'mange' => '0.600000'
   },
  'cat' => {
      "félin" => '0.500000',
      'chat' => '0.600000 '
  },
  'mouse' => {
      'souris' => '0.600000 ',
      'rat' => '0.500000'
  }
};

现在我需要比较存储在第一个Hash中的数组的每个值：

words_en'    => ['The', 'cat', 'eats', 'the', 'mouse']

使用第二个哈希的键。

最后，我想打印出类似的东西：

The cat[chat;félin] eats[mange;manger] the souris[souris;rat] # Le chat mange la souris # 2.8

Answer 1

您可以使用map快速执行所需的一些操作，List::MoreUtils中还有其他操作符可以帮助您更好地处理这些类型的数据结构。

#!/usr/bin/env perl
use strict;
use warnings;

my @words_en = ('The', 'cat', 'eats', 'the', 'mouse');
my $conversion  = {
  'eats' => {
      'manger' => '0.500000',
      'mange' => '0.600000'
   },
  'cat' => {
      "félin" => '0.500000',
      'chat' => '0.600000 '
  },
  'mouse' => {
      'souris' => '0.600000 ',
      'rat' => '0.500000'
  }
};

# Creates corresponding translation to your @words_en data
my @translations = map { [ keys ( $conversion->{$_} || {} ) ] }   @words_en  ;

# Print in the suggested format
for (my $i = 0; $i < scalar @words_en; $i++) {
  print $words_en[$i];
  print '[' . join(';', @{ $translations[$i] } ) . ']' if scalar $translations[$i];

  # Terminate this word
  print ( ($i < scalar(@words_en) - 1) ? " " : ".\n" );
}

如何比较哈希表中的键与数组的值（存储在另一个哈希表中）？

1 个答案: