使用sed或awk将XML格式化为逗号分隔

时间:2017-03-21 15:32:42

标签: xml awk sed grep

我需要帮助格式化此xml文件,以逗号分隔的形式导入到表中。我玩过sed和awk,但这是一场斗争。

示例:

<requestID>224</requestID>,
     <ErrorMessage>The following is required: PersonName </ErrorMessage>,
     <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
<requestID>615</requestID>,
    <ErrorMessage>The following is required: PersonName </ErrorMessage>,
     <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>

结果:

 <requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
 <requestID>615</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>

我已经能够在我想要的地方添加逗号

 sed 's/ErrorMessage>$/ErrorMessage>,/; s/requestID>$/requestID>,/'

并且我认为删除标签会更好,但它也会删除所有空格。

  tr -d ' \t' <grep.xml  > test.xml

我不确定如何将一行移到上一行的末尾......

所以这部分有用......

 awk '{if ($0 ~ /<ErrorMessage>,*/) { printf "%s", $0; getline var; printf "%s\n", var} else {print $0}}' test.xml


    <requestID>260</requestID>,
            <ErrorMessage>The following is required: PersonName</ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>260</requestID></TCRMService>

但是现在我将ErrorMessage移到RequestID行的末尾时遇到了麻烦......

请注意,在ErrorMessage行中,requestID也位于同一行。我认为关键是要在

上寻找模式匹配
         </requestID>,

4 个答案:

答案 0 :(得分:0)

在awk中,非常QnD(假设只有空格,没有标签):

$ awk '{gsub(/^ +| +$|, *$/,"");printf "%s%s", ($0~/^ *<requestID>/?ORS:","), $0}END{print ""}' file

<requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
<requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>

现在它只需要删除领先的换行符,但我需要赶上公共汽车(我可以得到一个交通工具,男人)。

答案 1 :(得分:0)

所以这部分有用......

 awk '{if ($0 ~ /<ErrorMessage>,*/) { printf "%s", $0; getline var; printf "%s\n", var} else {print $0}}' test.xml


    <requestID>260</requestID>,
            <ErrorMessage>The following is required: PersonName</ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>260</requestID></TCRMService>

但是现在我将ErrorMessage移到RequestID行的末尾时遇到了麻烦......

请不要,在ErrorMessage行中,requestID也在同一行。

答案 2 :(得分:0)

试试这个 -

awk -v FS=""  '{gsub(/^[[:space:]]+/,"",$0);ORS=(NR%3==0?RS:FS)}1' f

答案 3 :(得分:0)

为什么不是perl片段?随着波纹管被删除,删除了两个以上的空格。由于您在主要问题中建议的输入文件已经有相应的逗号,因此不会添加逗号。

$ cat file3 |nl
     1  <requestID>224</requestID>,
     2       <ErrorMessage>The following is required: PersonName </ErrorMessage>,
     3       <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
     4  <requestID>615</requestID>,
     5      <ErrorMessage>The following is required: PersonName </ErrorMessage>,
     6       <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>

$ perl -pe 's/\n//g; s/[[:space:]]{2,}//g; s/<\/TCRMService>/$&\n/g' file3 |nl
     1  <requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
     2  <requestID>615</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>