我是Perl的新手,我正在从文件中读取文本,并希望用法语翻译一些单词。我设法逐字逐句地获取,但不是通过表达式/字符串,我遇到了代码明智的问题。
逐字逐句的代码:
my $filename = 'assign3.txt';
my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November");
my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre");
my $i=1;
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file $filename !";
while (<$fh>) {
for my $word (split)
{
print " $i. $word \n";
$i++;
for (my $j=0; $j < 9;$j++){
if ($word eq $lexicon_en[$j]){
print "Found one! - j value is $j\n";
}
}
}
}
print "\ndone here!!\n";
这是我正在尝试使用的正则表达式:
/\w+\s\w+/
这是我的字符串代码:
while (<>) {
print ("this is text: $_ \n");
if ((split (/Due\sDate/),$_) eq "Due Date"){
print "yes!!\n";
}
}
答案 0 :(得分:2)
我想我理解你所面临的挑战。因为“截止日期”是两个单词,所以在“到期”匹配之前需要它匹配,否则会得到几个不正确的翻译。处理这种情况的一种方法是按最少数量的单词订购您的匹配,以便在“到期日”之前处理“截止日期”。
如果将数组转换为哈希(字典),则可以根据空格数对顺序进行排序,然后迭代它们以进行实际替换:
#!/usr/bin/perl
use strict;
use warnings;
#my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November");
#my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre");
# convert your arrays to a hash
my %lexicon = (
'Winter' => 'Hiver',
'Date' => 'Date',
'Due Date' => 'Date de Remise',
'Problem' => 'Problème',
'Summer' => 'Été',
'Mark' => 'Point',
'Fall' => 'Automne',
'Assignment' => 'Devoir',
'November' => 'Novembre',
);
# sort the keys on the number of spaces found
my @ordered_keys = sort { ($a =~ / /g) < ($b =~ / /g) } keys %lexicon;
my $sample = 'The due date of the assignment is a date in the fall.';
print "sample before: $sample\n";
foreach my $key (@ordered_keys) {
$sample =~ s/${key}/${lexicon{${key}}}/ig;
}
print "sample after : $sample\n";
输出:
sample before: The due date of the assignment is a date in the fall.
sample after : The Date de Remise of the Devoir is a Date in the Automne.
接下来的挑战是确保替换的情况符合被替换的内容。
答案 1 :(得分:1)
使用\ b检测单词边界而不是\ w来检测空格。
结合Steven Klassen的解决方案 How to replace a set of search/replace pairs?
#!/usr/bin/perl
use strict;
use warnings;
my %lexicon = (
'Winter' => 'Hiver',
'Date' => 'Date',
'Due Date' => 'Date de Remise',
'Problem' => 'Problème',
'Summer' => 'Été',
'Mark' => 'Point',
'Fall' => 'Automne',
'Assignment' => 'Devoir',
'November' => 'Novembre',
);
# add lowercase
for (keys %lexicon) {
$lexicon{lc($_)} = lc($lexicon{$_});
print $_ . " " . $lexicon{lc($_)} . "\n";
}
# Combine to one big regexp.
# https://stackoverflow.com/questions/17596917/how-to-replace-a-set-of-search-replace-pairs?answertab=votes#tab-top
my $regexp = join '|', map { "\\b$_\\b" } keys %lexicon;
my $sample = 'The due date of the assignment is a date in the fall.';
print "sample before: $sample\n";
$sample =~ s/($regexp)/$lexicon{$1}/g;
print "sample after : $sample\n";