some text and some text too bad,
some too bad again some bad
and other words bad, it is too bad
我试图将所有单词“bad”替换为“good”,但有例外:
如果单词“too”在“bad”之前,“bad”不应该变为“good”, 在“太”和“坏”之间可以有一个或多个空格,甚至是html空格“”
所以在正则表达式操作后文本应该是
some text and some text too bad,
some too bad again some good
and other words good, it is too bad
尝试过这样的事情,但它无法正常工作。
$text ~= s/(too(\s+|\s* \s*))bad/good/ig;
请帮忙
答案 0 :(得分:1)
我不相信这可以使用正则表达式方便地完成。它变得更加复杂,因为词的概念并不清楚:例如,你想把“坏”视为“坏”这个词。
这个程序通过将字符串标记为单词和分隔符来工作,然后将所有出现的“坏”改为“好”,除非它们之前是“太”(忽略大小写)。我在可能的分隔符列表中包含了逗号,冒号和分号。您可能需要调整此项以获得您期望的结果。
use strict;
use warnings;
my $text = <<END;
some text and some text too bad,
some too bad again some bad
and other words bad, it is too bad
END
my @tokens = split /((?:[\s,;.:]| )+)/, $text;
for my $i (grep { lc $tokens[$_] eq 'bad' } 1 .. $#tokens) {
$tokens[$i] = 'good' unless lc $tokens[$i-2] eq 'too';
}
print join '', @tokens;
<强>输出强>
some text and some text too bad,
some too bad again some good
and other words good, it is too bad
答案 1 :(得分:-1)
您可以尝试解码html
空格,并应用正则表达式来评估前面的字符串是否为too
:
#!/usr/bin/env perl;
use strict;
use warnings;
use HTML::Entities;
while ( <DATA> ) {
_decode_entities($_, { nbsp => "\xA0" });
s/(\w+)(\s+)bad/$1 eq 'too' ? $& : "$1$2good"/eg;
encode_entities($_);
print $_;
}
__DATA__
some text and some text too bad,
some too bad again some bad
and other words bad, it is too bad
像以下一样运行:
perl script.pl
产量:
some text and some text too bad,
some too bad again some good
and other words good, it is too bad