在占位符内删除xml标记

时间:2014-10-20 15:42:10

标签: xml replace sed placeholder strip

我想使用sed(或其他工具)来删除xml标记,但仅限于标有“{{''}}'占位符的特定位置。 例如:

<ok><ok2>{{TextShouldStay<not_ok>this_should_be_out</not_ok>
<sthelse/>ThisShouldBeAgain}}</ok2></ok>

预期结果:

<ok><ok2>{{TextShouldStayThisShouldBeAgain}}</ok2></ok>

任何想法如何实现?

1 个答案:

答案 0 :(得分:1)

<强>命令:

tr '\n' ' ' < file.xml | sed -r 's/(.*\{\{)([A-Za-z0-9]*)(<.*\/>)(.*)/\1\2\4\n/g'

<强>输出:

sdlcb@Goofy-Gen:~/AMD$ cat file.xml
<ok><ok2>{{TextShouldStay<not_ok>this_should_be_out</not_ok>
<sthelse/>ThisShouldBeAgain}}</ok2></ok>
sdlcb@Goofy-Gen:~/AMD$ tr '\n' ' ' < file.xml | sed -r 's/(.*\{\{)([A-Za-z0-9]*)(<.*\/>)(.*)/\1\2\4\n/g'
<ok><ok2>{{TextShouldStayThisShouldBeAgain}}</ok2></ok>
sdlcb@Goofy-Gen:~/AMD$


Here we remove the newlines first using 'tr' and then group the patterns using '(' and ')'. 
First group - from beginning of line to '{{' inclusive
Second group - after '{{', whatever alphabets/numbers
Third group - characters between the next '<' and last '/>'
Fourth group - remaining characters.

Once grouped, we remove the 3rd pattern group, also add newline.