我怎么能用sed修剪xml标签之间的字符串?

时间:2016-03-25 20:46:27

标签: xml awk sed trim substitution

我需要删除两个模式之间的空格。输入文件是xml。我需要保持xml格式。 我有这样的意见:

<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
  <description>AL</description>
  <columns>
    <column>abc d e</column>
    <column> fg </column>
  </columns>
  <rows>
    <row>
      <cell id="1">08 4 </cell>
      <cell id="2">AG</cell>
    </row>
    <row>
      <cell id="1">006</cell>
      <cell id="2"> AL</cell>
    </row>
    <row>
      <cell id="1">042 </cell>
      <cell id="2">AN  </cell>
    </row>
   </rows>

我想使用sed命令输出以下内容:

<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://http://www.google.it">
  <description>AL</description>
  <columns>
    <column>abcde</column>
    <column>fg</column>
  </columns>
  <rows>
    <row>
      <cell id="1">084</cell>
      <cell id="2">AG</cell>
    </row>
    <row>
      <cell id="1">006</cell>
      <cell id="2">AL</cell>
    </row>
    <row>
      <cell id="1">042</cell>
      <cell id="2">AN</cell>
    </row>
   </rows>

有人可以帮助我吗?

2 个答案:

答案 0 :(得分:1)

sed是针对单个行的简单替换,对于你应该使用awk的任何其他内容。

如果您的XML格式正确:

$ awk 'match($0,/(.*)(>[^<]+)(.*)/,a) { $0 = a[1] gensub(/ /,"","g",a[2]) a[3] } 1' file
<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
  <description>AL</description>
  <columns>
    <column>abcde</column>
    <column>fg</column>
  </columns>
  <rows>
    <row>
      <cell id="1">084</cell>
      <cell id="2">AG</cell>
    </row>
    <row>
      <cell id="1">006</cell>
      <cell id="2">AL</cell>
    </row>
    <row>
      <cell id="1">042</cell>
      <cell id="2">AN</cell>
    </row>
   </rows>

以上使用GNU awk作为第3个arg到match()gensub(),其他awks你使用substr(),一个临时变量,gsub():< / p>

$ awk '
match($0,/>[^<]+/) {
    t = substr($0,RSTART,RLENGTH)
    gsub(/ /,"",t)
    $0 = substr($0,1,RSTART-1) t substr($0,RSTART+RLENGTH)
}
1' file
<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
  <description>AL</description>
  <columns>
    <column>abcde</column>
    <column>fg</column>
  </columns>
  <rows>
    <row>
      <cell id="1">084</cell>
      <cell id="2">AG</cell>
    </row>
    <row>
      <cell id="1">006</cell>
      <cell id="2">AL</cell>
    </row>
    <row>
      <cell id="1">042</cell>
      <cell id="2">AN</cell>
    </row>
   </rows>

根据你的后续问题,只修剪前导/尾随空白:

$ awk '
match($0,/>[^<]+/) {
    t = substr($0,RSTART+1,RLENGTH-1)
    gsub(/^ +| +$/,"",t)
    $0 = substr($0,1,RSTART) t substr($0,RSTART+RLENGTH)
}
1' file
<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
  <description>AL</description>
  <columns>
    <column>abcde</column>
    <column>fg</column>
  </columns>
  <rows>
    <row>
      <cell id="1">08 4</cell>
      <cell id="2">AG</cell>
    </row>
    <row>
      <cell id="1">00 6</cell>
      <cell id="2">AL</cell>
    </row>
    <row>
      <cell id="1">0 42</cell>
      <cell id="2">AN</cell>
    </row>
   </rows>

答案 1 :(得分:0)

感谢您的回复,Ed。上面的命令就像一个sharm!

 awk '
match($0,/>[^<]+/) {
    t = substr($0,RSTART,RLENGTH)
    gsub(/ /,"",t)
    $0 = substr($0,1,RSTART-1) t substr($0,RSTART+RLENGTH)
}
1' file

我还有一个问题。如果我只想在'&gt;'之间删除空白,我该如何解决?和第一次出现的其他char,以及最后一次出现的char和'&lt;'炭?

如果我现在输入的是:

<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
  <description>AL</description>
  <columns>
    <column>abcde</column>
    <column>fg</column>
  </columns>
  <rows>
    <row>
      <cell id="1"> 08 4      </cell>
      <cell id="2">AG</cell>
    </row>
    <row>
      <cell id="1">    00 6        </cell>
      <cell id="2">   AL   </cell>
    </row>
    <row>
      <cell id="1">0 42 </cell>
      <cell id="2">AN  </cell>
    </row>
   </rows>

我怎样才能得到以下结果:

<?xml version="1.0" encoding="UTF-8"?>
<dvm name="Filename" xml="http://www.google.it">
  <description>AL</description>
  <columns>
    <column>abcde</column>
    <column>fg</column>
  </columns>
  <rows>
    <row>
      <cell id="1">08 4</cell>
      <cell id="2">AG</cell>
    </row>
    <row>
      <cell id="1">00 6</cell>
      <cell id="2">AL</cell>
    </row>
    <row>
      <cell id="1">0 42</cell>
      <cell id="2">AN</cell>
    </row>
   </rows>