使用正则表达式Perl在文件中搜索字符串

时间:2014-11-16 02:09:03

标签: regex perl

我是Perl的新手,我正在从文件中读取文本,并希望用法语翻译一些单词。我设法逐字逐句地获取,但不是通过表达式/字符串,我遇到了代码明智的问题。

逐字逐句的代码:

my $filename = 'assign3.txt';
my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November");   
my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre");
my $i=1;
open(my $fh, '<:encoding(UTF-8)', $filename)
    or die "Could not open file $filename !";
while (<$fh>) {
    for my $word (split)
    {
        print " $i. $word \n"; 
        $i++;
        for (my $j=0; $j < 9;$j++){
            if ($word eq $lexicon_en[$j]){
            print "Found one! - j value is $j\n";
            }
        }
     }
}
print "\ndone here!!\n";

这是我正在尝试使用的正则表达式:

    /\w+\s\w+/

这是我的字符串代码:

while (<>) {
        print ("this is text: $_ \n");

        if ((split (/Due\sDate/),$_) eq "Due Date"){
            print "yes!!\n";
        }
}

2 个答案:

答案 0 :(得分:2)

我想我理解你所面临的挑战。因为“截止日期”是两个单词,所以在“到期”匹配之前需要它匹配,否则会得到几个不正确的翻译。处理这种情况的一种方法是按最少数量的单词订购您的匹配,以便在“到期日”之前处理“截止日期”。

如果将数组转换为哈希(字典),则可以根据空格数对顺序进行排序,然后迭代它们以进行实际替换:

#!/usr/bin/perl
use strict;
use warnings;

#my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November");
#my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre");

# convert your arrays to a hash
my %lexicon = (
    'Winter' => 'Hiver',
    'Date' => 'Date',
    'Due Date' => 'Date de Remise',
    'Problem' => 'Problème',
    'Summer' => 'Été',
    'Mark' => 'Point',
    'Fall' => 'Automne',
    'Assignment' => 'Devoir',
    'November' => 'Novembre',
);

# sort the keys on the number of spaces found
my @ordered_keys = sort { ($a =~ / /g) < ($b =~ / /g) } keys %lexicon;

my $sample = 'The due date of the assignment is a date in the fall.';

print "sample before: $sample\n";

foreach my $key (@ordered_keys) {
    $sample =~ s/${key}/${lexicon{${key}}}/ig;
}

print "sample after : $sample\n";

输出:

sample before: The due date of the assignment is a date in the fall.
sample after : The Date de Remise of the Devoir is a Date in the Automne.

接下来的挑战是确保替换的情况符合被替换的内容。

答案 1 :(得分:1)

使用\ b检测单词边界而不是\ w来检测空格。

结合Steven Klassen的解决方案 How to replace a set of search/replace pairs?

#!/usr/bin/perl
use strict;
use warnings;

my %lexicon = (
    'Winter' => 'Hiver',
    'Date' => 'Date',
    'Due Date' => 'Date de Remise',
    'Problem' => 'Problème',
    'Summer' => 'Été',
    'Mark' => 'Point',
    'Fall' => 'Automne',
    'Assignment' => 'Devoir',
    'November' => 'Novembre',
);

# add lowercase
for (keys %lexicon) {
    $lexicon{lc($_)} = lc($lexicon{$_});
    print $_ . " " . $lexicon{lc($_)} . "\n";
}

# Combine to one big regexp.
# https://stackoverflow.com/questions/17596917/how-to-replace-a-set-of-search-replace-pairs?answertab=votes#tab-top
my $regexp = join '|', map { "\\b$_\\b" } keys %lexicon;

my $sample = 'The due date of the assignment is a date in the fall.';
print "sample before: $sample\n";
$sample =~ s/($regexp)/$lexicon{$1}/g;
print "sample after : $sample\n";