我需要删除两个模式之间的空格。输入文件是xml。我需要保持xml格式。 我有这样的意见:
<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
<description>AL</description>
<columns>
<column>abc d e</column>
<column> fg </column>
</columns>
<rows>
<row>
<cell id="1">08 4 </cell>
<cell id="2">AG</cell>
</row>
<row>
<cell id="1">006</cell>
<cell id="2"> AL</cell>
</row>
<row>
<cell id="1">042 </cell>
<cell id="2">AN </cell>
</row>
</rows>
我想使用sed命令输出以下内容:
<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://http://www.google.it">
<description>AL</description>
<columns>
<column>abcde</column>
<column>fg</column>
</columns>
<rows>
<row>
<cell id="1">084</cell>
<cell id="2">AG</cell>
</row>
<row>
<cell id="1">006</cell>
<cell id="2">AL</cell>
</row>
<row>
<cell id="1">042</cell>
<cell id="2">AN</cell>
</row>
</rows>
有人可以帮助我吗?
答案 0 :(得分:1)
sed是针对单个行的简单替换,对于你应该使用awk的任何其他内容。
如果您的XML格式正确:
$ awk 'match($0,/(.*)(>[^<]+)(.*)/,a) { $0 = a[1] gensub(/ /,"","g",a[2]) a[3] } 1' file
<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
<description>AL</description>
<columns>
<column>abcde</column>
<column>fg</column>
</columns>
<rows>
<row>
<cell id="1">084</cell>
<cell id="2">AG</cell>
</row>
<row>
<cell id="1">006</cell>
<cell id="2">AL</cell>
</row>
<row>
<cell id="1">042</cell>
<cell id="2">AN</cell>
</row>
</rows>
以上使用GNU awk作为第3个arg到match()
和gensub()
,其他awks你使用substr()
,一个临时变量,gsub()
:< / p>
$ awk '
match($0,/>[^<]+/) {
t = substr($0,RSTART,RLENGTH)
gsub(/ /,"",t)
$0 = substr($0,1,RSTART-1) t substr($0,RSTART+RLENGTH)
}
1' file
<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
<description>AL</description>
<columns>
<column>abcde</column>
<column>fg</column>
</columns>
<rows>
<row>
<cell id="1">084</cell>
<cell id="2">AG</cell>
</row>
<row>
<cell id="1">006</cell>
<cell id="2">AL</cell>
</row>
<row>
<cell id="1">042</cell>
<cell id="2">AN</cell>
</row>
</rows>
根据你的后续问题,只修剪前导/尾随空白:
$ awk '
match($0,/>[^<]+/) {
t = substr($0,RSTART+1,RLENGTH-1)
gsub(/^ +| +$/,"",t)
$0 = substr($0,1,RSTART) t substr($0,RSTART+RLENGTH)
}
1' file
<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
<description>AL</description>
<columns>
<column>abcde</column>
<column>fg</column>
</columns>
<rows>
<row>
<cell id="1">08 4</cell>
<cell id="2">AG</cell>
</row>
<row>
<cell id="1">00 6</cell>
<cell id="2">AL</cell>
</row>
<row>
<cell id="1">0 42</cell>
<cell id="2">AN</cell>
</row>
</rows>
答案 1 :(得分:0)
感谢您的回复,Ed。上面的命令就像一个sharm!
awk '
match($0,/>[^<]+/) {
t = substr($0,RSTART,RLENGTH)
gsub(/ /,"",t)
$0 = substr($0,1,RSTART-1) t substr($0,RSTART+RLENGTH)
}
1' file
我还有一个问题。如果我只想在'&gt;'之间删除空白,我该如何解决?和第一次出现的其他char,以及最后一次出现的char和'&lt;'炭?
如果我现在输入的是:
<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
<description>AL</description>
<columns>
<column>abcde</column>
<column>fg</column>
</columns>
<rows>
<row>
<cell id="1"> 08 4 </cell>
<cell id="2">AG</cell>
</row>
<row>
<cell id="1"> 00 6 </cell>
<cell id="2"> AL </cell>
</row>
<row>
<cell id="1">0 42 </cell>
<cell id="2">AN </cell>
</row>
</rows>
我怎样才能得到以下结果:
<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
<description>AL</description>
<columns>
<column>abcde</column>
<column>fg</column>
</columns>
<rows>
<row>
<cell id="1">08 4</cell>
<cell id="2">AG</cell>
</row>
<row>
<cell id="1">00 6</cell>
<cell id="2">AL</cell>
</row>
<row>
<cell id="1">0 42</cell>
<cell id="2">AN</cell>
</row>
</rows>