我混淆了一些xml文件,现在有类似
的东西<Schema>
stuff
</Schema><Schema>
stuff
</Schema><Schema>
..
我需要将它们全部拆分,以便在每个文件中从<Schema>
到</Schema>
答案 0 :(得分:3)
使用awk的一种方法。它将寄存器与结束标记分开,如果有字符,则在打印所有字符之前:
awk -c '
BEGIN { RS = "</Schema>" }
$0 ~ /[^[:blank:]\n]/ {
printf "%s\n", $0 RS >> FILENAME "_" ++i ".xml"
}
' infile
假设infile
包含内容:
<Schema>
stuff
</Schema><Schema>
more stuff
</Schema><Schema>
and more stuff
</Schema>
它产生:
==> infile_1.xml <==
<Schema>
stuff
</Schema>
==> infile_2.xml <==
<Schema>
more stuff
</Schema>
==> infile_3.xml <==
<Schema>
and more stuff
</Schema>