如何替换<div class =“definition”> </div>中的内容,但如果{}在里面则不能?

时间:2014-10-03 09:56:23

标签: regex perl

我需要将所有<div class="definition">text here</div>内的内容替换为“...”,但如果在其中找到任何{}则不需要。我使用perl尝试了此操作,但似乎删除了太多,有时会找到第一个<div>和最后</div>

perl -pe 's/<div class="definition">[^{].*[^<]<\/div>/<div class="definition">...<\/div>/g'

E.g:

This is a file <div class="definition">text here</div>.
This is a file <div class="definition">{text here}</div>. This is a file <div class="definition">text here</div>.
This is a file <div class="definition">text here</div>.

输出:

This is a file <div class="definition">...</div>.
This is a file <div class="definition">{text here}</div>. This is a file <div class="definition">...</div>.
This is a file <div class="definition">...</div>.

如何替换那里的任何内容,但如果在内部找到{}则不会?

2 个答案:

答案 0 :(得分:1)

你可以尝试下面的perl命令。

$ perl -pe 's/(<div class="definition">)[^{}<]+(<\/div>)/\1...\2/g' file
This is a file <div class="definition">...</div>.
This is a file <div class="definition">{text here}</div>. This is a file <div class="definition">...</div>.
This is a file <div class="definition">...</div>.

答案 1 :(得分:1)

虽然它不是一个单行,但很容易通过一些Mojo::DOM魔法来完成你想要的任务。这是代码:

#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use feature ':5.10';

use Mojo::DOM;

my $html = 'This is a file <div class="definition">text here</div>.
This is a file <div class="definition">{text here}</div>. This is a file <div class="definition">text here</div>.
This is a file <div class="definition">text here</div>.';

my $dom = Mojo::DOM->new( $html );

$dom->find( 'div.definition' )->grep(sub { $_->text =~ m#^[^\{]# })->replace('<div class="definition">...</div>');

say $dom;

输出:

This is a file <div class="definition">...</div>.
This is a file <div class="definition">{text here}</div>. This is a file <div class="definition">...</div>.
This is a file <div class="definition">...</div>.

解释发生了什么:

# this finds all div nodes with class definition
$dom->find( 'div.definition' )

# then filter the collection of nodes by the
->grep(sub { $_->text =~ m#^[^\{]# })

# replace those nodes with '<div class="definition">...</div>'
->replace('<div class="definition">...</div>');