我有一个这样的文本文件:
This is a <div class="animal">fish</div>. He likes to <div class="verb">swim</div>.
This is a <div class="animal">bear</div>.
The <div class="animal">bear</div> likes <br> to eat fish.
我需要在所有文本(包括标点符号)周围放置标记,这些标记不在任何<div>
标记内。 E.g:
<div class="other">This is a </div><div class="animal">fish</div><div class="other">. He likes to </div><div class="verb">swim</div><div class="other">.</div>
<div class="other">This is a </div><div class="animal">bear</div><div class="other">.</div>
<div class="other">The </div><div class="animal">bear</div><div class="other"> likes <br> to eat fish.</div>
<div>
可能出现在同一行。<div>
从不跨越多行。如何在文件的所有未标记文本部分周围标记<div class="other">
和</div>
?
答案 0 :(得分:1)
此awk
应该:
awk '!/^</ {$0="<div class=\"other\">"$0"</div>"}1'
<div class="other">This is a <div class="animal">fish</div>. He likes to <div class="verb">swim</div>.</div>
<div class="other">This is a <div class="animal">bear</div>.</div>
<div class="other">The <div class="animal">bear</div> likes <br> to eat fish.</div>
它只是围绕不以<div.. /div>
<
答案 1 :(得分:1)
使用Perl,您可以在split
元素上<div>
并同时捕获它们。结果就是这个清单
This is a
<div class="animal">fish</div>
. He likes to
<div class="verb">swim</div>
.
This is a
<div class="animal">bear</div>
.
The
<div class="animal">bear</div>
likes <br> to eat fish.
然后,所有必要的是将<div>
括号中的那些元素中的那些元素括起来并重新加入这些元素。
这个程序演示了,虽然它产生了一些讨厌的HTML!
use strict;
use warnings;
my $text = <<'__END_TEXT__';
This is a <div class="animal">fish</div>. He likes to <div class="verb">swim</div>.
This is a <div class="animal">bear</div>.
The <div class="animal">bear</div> likes <br> to eat fish.
__END_TEXT__
my @parts = split m{(<div\b.+?</div>)}, $text;
print "- `$_`\n" for @parts;
for my $part (@parts) {
$part = qq{<div class="other">$part</div>} unless $part =~ m{^<div\b};
}
my $fixed_text = join '', @parts;
print $fixed_text, "\n";
<强>输出强>
<div class="other">This is a </div><div class="animal">fish</div><div class="other">. He likes to </div><div class="verb">swim</div><div class="other">.
This is a </div><div class="animal">bear</div><div class="other">.
The </div><div class="animal">bear</div><div class="other"> likes <br> to eat fish.
</div>
答案 2 :(得分:1)
这可能适合你(GNU sed):
sed '/\n/!{s/<div/\n&/g;s/\/div>/&\n/g};/^<div/!{s/^/<div class="other">/;s/\n\|$/<\/div>&/};P;D' file
这会将该行分成一系列语句。