我有一个像这样的AWK脚本,我将在一个文件上运行:
cat input.txt | awk 'gsub(/[^ ]*(fish|shark|whale)[^ ]*/,"(&)")' >> output.txt
这为所有包含单词“fish”,“shark”或“whale”的行添加括号,例如:
The whale asked the shark to swim elsewhere.
The fish were unhappy.
在脚本中运行后,文件变为:
The (whale) asked the (shark) to swim elsewhere.
The (fish) were unhappy.
该文件标有HTML标记,我只需要在<b>
和</b>
标记之间进行替换。
The whale asked <b>the shark to swim</b> elsewhere.
<b>The fish were</b> unhappy.
这变为:
The whale asked <b> the (shark) to swim </b> elsewhere.
<b> The (fish) were </b> unhappy.
<b>
代码始终与结束</b>
代码显示在同一行。如何限制awk
的搜索仅搜索和修改<b>
和</b>
代码之间的文字?
答案 0 :(得分:1)
只要HTML标记不差,并且<b> ... </b>
跨度不包含任何其他HTML标记,那么在Perl中它相对容易:
$ cat data
The whale asked <b>the shark to swim</b> elsewhere.
<b>The fish were</b> unhappy.
The <b> dogfish and the sharkfin soup</b> were unscathed.
$ perl -pe 's/(<b>[^<]*)\b(fish|shark|whale)\b([^<]*<\/b>)/\1(\2)\3/g' data | so
The whale asked <b>the (shark) to swim</b> elsewhere.
<b>The (fish) were</b> unhappy.
The <b> dogfish and the sharkfin soup</b> were unscathed.
$
我尝试将其改编为awk
(和gawk
),但没有成功;匹配部分工作,但替换表达式没有。与Perl不同,阅读手册时,您无法在括号中识别单独的匹配子表达式。
答案 1 :(得分:1)
这是一种使用awk
的技术:
awk '/<b>/{f=1}/<\/b>/{f=0}f{gsub(/fish|shark|whale/,"(&)")}1' RS=' ' ORS=' ' file
The whale asked <b>the (shark) to swim</b> elsewhere.
<b>The (fish) were</b> unhappy.