我想从两个文本文件创建一个“复杂”的数据结构。
第一个文件有英文句子,法语句子和每行的分数。例如
The cat eats the mouse # Le chat mange la souris # 2.8
使用以下代码填充文件中的哈希,我没有遇到任何问题。关键是英语句子。
my $hash = { };
while ( my $text = <TEXT> ) {
if ( $text =~ /(^(.+)#(.+)#(.+)$)/ ) {
my $english_sentence = $2;
my $french_sentence = $3;
my $score = $4;
$hash->{$english_sentence}->{translation} = $french_sentence;
$hash->{$english_sentence}->{score} = $score;
my @words_en = split(/ /, $english_sentence);
$hash->{$english_sentence}->{tokens_en} = \@words_en;
}
else {
print "No sentences find in the input document" . "\n";
}
}
结果哈希看起来像这样
$VAR1 = {
'The cat eats the mouse' => {
'words_en' => ['The', 'cat', 'eats', 'the', 'mouse'],
'score' => '2.8',
'translation' => 'Le chat mange la souris'
}
};
第二个文件包含英文单词的翻译
cat ||| chat ||| 0.600000
cat ||| félin ||| 0.500000
eats ||| mange ||| 0.500000
eats ||| manger ||| 0.500000
mouse ||| souris ||| 0.600000
mouse ||| rat ||| 0.500000
结果哈希看起来像这样:
$VAR1 = {
'eats' => {
'manger' => '0.500000',
'mange' => '0.600000'
},
'cat' => {
"félin" => '0.500000',
'chat' => '0.600000 '
},
'mouse' => {
'souris' => '0.600000 ',
'rat' => '0.500000'
}
};
现在我需要比较存储在第一个Hash中的数组的每个值:
words_en' => ['The', 'cat', 'eats', 'the', 'mouse']
使用第二个哈希的键。
最后,我想打印出类似的东西:
The cat[chat;félin] eats[mange;manger] the souris[souris;rat] # Le chat mange la souris # 2.8
答案 0 :(得分:0)
您可以使用map快速执行所需的一些操作,List::MoreUtils中还有其他操作符可以帮助您更好地处理这些类型的数据结构。
#!/usr/bin/env perl
use strict;
use warnings;
my @words_en = ('The', 'cat', 'eats', 'the', 'mouse');
my $conversion = {
'eats' => {
'manger' => '0.500000',
'mange' => '0.600000'
},
'cat' => {
"félin" => '0.500000',
'chat' => '0.600000 '
},
'mouse' => {
'souris' => '0.600000 ',
'rat' => '0.500000'
}
};
# Creates corresponding translation to your @words_en data
my @translations = map { [ keys ( $conversion->{$_} || {} ) ] } @words_en ;
# Print in the suggested format
for (my $i = 0; $i < scalar @words_en; $i++) {
print $words_en[$i];
print '[' . join(';', @{ $translations[$i] } ) . ']' if scalar $translations[$i];
# Terminate this word
print ( ($i < scalar(@words_en) - 1) ? " " : ".\n" );
}