如何在<div>标签内的所有内容周围放置标签?</div>

时间:2014-08-09 06:49:50

标签: bash perl sed

我有一个这样的文本文件:

This is a <div class="animal">fish</div>. He likes to <div class="verb">swim</div>.
This is a <div class="animal">bear</div>.
The <div class="animal">bear</div> likes <br> to eat fish.

我需要在所有文本(包括标点符号)周围放置标记,这些标记不在任何<div>标记内。 E.g:

<div class="other">This is a </div><div class="animal">fish</div><div class="other">. He likes to </div><div class="verb">swim</div><div class="other">.</div>
<div class="other">This is a </div><div class="animal">bear</div><div class="other">.</div>
<div class="other">The </div><div class="animal">bear</div><div class="other"> likes <br> to eat fish.</div>
  • 多个嵌套<div>可能出现在同一行。
  • <div>从不跨越多行。

如何在文件的所有未标记文本部分周围标记<div class="other"></div>

3 个答案:

答案 0 :(得分:1)

awk应该:

awk '!/^</ {$0="<div class=\"other\">"$0"</div>"}1'
<div class="other">This is a <div class="animal">fish</div>. He likes to <div class="verb">swim</div>.</div>
<div class="other">This is a <div class="animal">bear</div>.</div>
<div class="other">The <div class="animal">bear</div> likes <br> to eat fish.</div>

它只是围绕不以<div.. /div>

开头的每一行包裹<

答案 1 :(得分:1)

使用Perl,您可以在split元素上<div>并同时捕获它们。结果就是这个清单

  • This is a
  • <div class="animal">fish</div>
  • . He likes to
  • <div class="verb">swim</div>
  • . This is a
  • <div class="animal">bear</div>
  • . The
  • <div class="animal">bear</div>
  • likes <br> to eat fish.

然后,所有必要的是将<div>括号中的那些元素中的那些元素括起来并重新加入这些元素。

这个程序演示了,虽然它产生了一些讨厌的HTML!

use strict;
use warnings;

my $text = <<'__END_TEXT__';
This is a <div class="animal">fish</div>. He likes to <div class="verb">swim</div>.
This is a <div class="animal">bear</div>.
The <div class="animal">bear</div> likes <br> to eat fish.
__END_TEXT__

my @parts = split m{(<div\b.+?</div>)}, $text;

print "- `$_`\n" for @parts;

for my $part (@parts) {
  $part = qq{<div class="other">$part</div>} unless $part =~ m{^<div\b};
}

my $fixed_text = join '', @parts;
print $fixed_text, "\n";

<强>输出

<div class="other">This is a </div><div class="animal">fish</div><div class="other">. He likes to </div><div class="verb">swim</div><div class="other">.
This is a </div><div class="animal">bear</div><div class="other">.
The </div><div class="animal">bear</div><div class="other"> likes <br> to eat fish.
</div>

答案 2 :(得分:1)

这可能适合你(GNU sed):

 sed '/\n/!{s/<div/\n&/g;s/\/div>/&\n/g};/^<div/!{s/^/<div class="other">/;s/\n\|$/<\/div>&/};P;D' file

这会将该行分成一系列语句。