在正则表达式中替换另一个单词

时间:2013-10-25 11:56:15

标签: regex perl

some text and some text too bad,
some too  bad again some bad
and other words bad, it is too       bad 

我试图将所有单词“bad”替换为“good”,但有例外:

如果单词“too”在“bad”之前,“bad”不应该变为“good”, 在“太”和“坏”之间可以有一个或多个空格,甚至是html空格“”

所以在正则表达式操作后文本应该是

    some text and some text too bad,
    some too  bad again some good
    and other words good, it is too       bad 

尝试过这样的事情,但它无法正常工作。

$text ~= s/(too(\s+|\s* \s*))bad/good/ig;

请帮忙

2 个答案:

答案 0 :(得分:1)

我不相信这可以使用正则表达式方便地完成。它变得更加复杂,因为的概念并不清楚:例如,你想把“坏”视为“坏”这个词。

这个程序通过将字符串标记为单词和分隔符来工作,然后将所有出现的“坏”改为“好”,除非它们之前是“太”(忽略大小写)。我在可能的分隔符列表中包含了逗号,冒号和分号。您可能需要调整此项以获得您期望的结果。

use strict;
use warnings;

my $text = <<END;
some text and some text too bad,
some too&nbsp; bad again some bad
and other words bad, it is too       bad 
END

my @tokens = split /((?:[\s,;.:]|&nbsp;)+)/, $text;

for my $i (grep { lc $tokens[$_] eq 'bad' } 1 .. $#tokens) {
  $tokens[$i] = 'good' unless lc $tokens[$i-2] eq 'too';
}

print join '', @tokens;

<强>输出

some text and some text too bad,
some too&nbsp; bad again some good
and other words good, it is too       bad 

答案 1 :(得分:-1)

您可以尝试解码html空格,并应用正则表达式来评估前面的字符串是否为too

#!/usr/bin/env perl;

use strict;
use warnings;
use HTML::Entities;

while ( <DATA> ) { 
    _decode_entities($_, { nbsp => "\xA0" }); 
    s/(\w+)(\s+)bad/$1 eq 'too' ? $& : "$1$2good"/eg;
    encode_entities($_);
    print $_; 
}

__DATA__
some text and some text too bad,
some too&nbsp; bad again some bad
and other words bad, it is too       bad

像以下一样运行:

perl script.pl

产量:

some text and some text too bad,
some too&nbsp; bad again some good
and other words good, it is too       bad