一行中的Sed替代模式

时间:2017-02-28 15:10:56

标签: regex awk sed

我怎样才能在特定模式中替换字符,最好是在sed中,但是如果有更简单的选择,请使用awk或其他方式替换字符?我想用连字符( - )替换我的html h3 id中的空格,但我不希望它连接整行。

例如,在我的foo.html中:

<p>This is a paragraph which likes its spaces.</p>

<h3 id="No spaces in this id please">Keep spaces in this title</h3>

<p>Here's another paragraph with spaces.</p>

<h3 id="Another id that should be spaceless">Spaces please!</h3>

<p>Yes I would like extra noodles in my soup.</p>

我想要的是像这样的人:

<h3 id="Another-id-that-should-be-spaceless">Spaces please!</h3>

我已经尝试了

sed -e "/^<h3 id=\"/,/\">/s/ /-/g;" <foo.html >bar.html

但是这贪婪地将连字符添加到不应该连字符的行(第2页)和部分(h3内容)中!一个bar.html:

<p>This is a paragraph which likes its spaces.</p>

<h3-id="No-spaces-in-this-id-please">Keep-spaces-in-this-title</h3>

<p>Here's-another-paragraph-with-spaces.</p>

<h3-id="Another-id-that-should-be-spaceless">Spaces-please!</h3>

<p>Yes I would like extra noodles in my soup.</p>

注意我使用的是GNU sed。谢谢!

1 个答案:

答案 0 :(得分:0)

此sed在id代码的h3值中一次替换一个空格。当替换成功时,t命令循环到:a标签以搜索要替换的剩余空格:

sed -e ':a;s/\(<h3[^>]*id="[^"> ]*\) \(.*\)/\1-\2/;ta;' < foo.html > bar.html