我在字符串中匹配多个模式来填充数组。输入文件如下所示:
I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins # 2.8
My father [père;parent;papa] lives in New-York # Mon père vit à New-York # 1.8
我使用此代码:
use strict;
use warnings;
use Data::Dump;
open(TEXT, "<", "$ARGV[0]")
or die "cannot open < $ARGV[0]: $!";
while(my $text = <TEXT>)
{
my @lines = split /\n/, $text;
foreach my $line (@lines) {
if ($line =~ /(^(.+)\t(.+)\t(.+)$)/){
my $english_sentence = $2;
my $french_sentence = $3;
my $score = $4;
print $english_sentence."#".$french_sentence."";
my @data = map [ split /;/ ], $line =~ / \[ ( [^\[\]]+ ) \] /xg;
dd \@data;
}
print "\n";
}
}
close TEXT;
这是输出:
I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins
Array==>[["chats", "chaton", "chatterie"], ["lapins", "lapereau"]]
My father [père;parent;papa] lives in New-York # Mon père vit à New-York
Array==>[["père", "parent", "papa"]]
当这个字符串与句子的一部分匹配时,我需要删除数组中的字符串。最后,我想得到这样的结果:
I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins
[["chats"], ["lapins"]]
My father [père;parent;papa] lives in New-York # Mon père vit à New-York
[["père"]]
答案 0 :(得分:1)
这会按照你的要求行事。它只使用带有正则表达式的grep
将每个列表减少为只显示在法语句子中的那些词。
use utf8;
use strict;
use warnings;
use 5.010;
use autodie;
use open qw/ :std :encoding(UTF-8) /;
use Data::Dump;
open my $fh, '<', 'sentences.txt';
while (<$fh>) {
my @sentences = split /\s*#\s*/;
next unless @sentences == 3;
print join(' # ', @sentences[0,1]), "\n";
my @data = map [ split /;/ ], $sentences[0] =~ / \[ ( [^\[\]]+ ) \] /xg;
$_ = [ grep { $sentences[1] =~ /\b\Q$_\E\b/ } @$_ ] for @data;
dd \@data;
print "\n";
}
<强>输出强>
I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins
[["chats"], ["lapins"]]
My father [père;parent;papa] lives in New-York # Mon père vit à New-York
[["p\xE8re"]]
<强>更新强>
根据要求,此代码将就地修改单词列表,以便它们包含仅出现在翻译中的 字样。
use utf8;
use strict;
use warnings;
use 5.010;
use autodie;
use open qw/ :std :utf8 /;
open my $fh, '<', 'sentences.txt';
while (<$fh>) {
my @sentences = split /\s*#\s*/;
next unless @sentences == 3;
print join(' # ', @sentences[0,1]), "\n";
$sentences[0] =~ s{ \[ ( [^\[\]]+ ) \] }{
my @words = split /;/, $1;
@words = grep { $sentences[1] =~ /\b\Q$_\E\b/ } @words;
sprintf "[%s]", join ';', @words;
}exg;
print join(' # ', @sentences[0,1]), "\n\n";
}
<强>输出强>
I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins
I love cat [chats] and rabbit [lapins] # J'aime les chats et les lapins
My father [père;parent;papa] lives in New-York # Mon père vit à New-York
My father [père] lives in New-York # Mon père vit à New-York
答案 1 :(得分:0)
您也可以通过创建法语句子的散列来实现此目的 这可能会更快,因为它避免了第三个正则表达式。
use strict;
use warnings;
while (<DATA>) {
my ($English, $French, $repl, %FrWords);
if ( ($English, $French) = m/^([^#]*)\#([^#]*)\#/ ) {
@FrWords{ split /\h+/, $French } = undef;
$English =~ s{ \[ ([^\[\]]*) \] }{
$repl = join( ';', grep { exists $FrWords{$_} } split /;/, $1 );
'['. (length($repl) ? $repl : '') .']';
}xeg;
print $English, '#', $French, "\n";
}
}
__DATA__
I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins # 2.8
My father [père;parent;papa] lives in New-York # Mon père vit à New-York # 1.8
输出
I love cat [chats] and rabbit [lapins] # J'aime les chats et les lapins
My father [père] lives in New-York # Mon père vit à New-York