我需要帮助格式化此xml文件,以逗号分隔的形式导入到表中。我玩过sed和awk,但这是一场斗争。
示例:
<requestID>224</requestID>,
<ErrorMessage>The following is required: PersonName </ErrorMessage>,
<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
<requestID>615</requestID>,
<ErrorMessage>The following is required: PersonName </ErrorMessage>,
<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
结果:
<requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
<requestID>615</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
我已经能够在我想要的地方添加逗号
sed 's/ErrorMessage>$/ErrorMessage>,/; s/requestID>$/requestID>,/'
并且我认为删除标签会更好,但它也会删除所有空格。
tr -d ' \t' <grep.xml > test.xml
我不确定如何将一行移到上一行的末尾......
所以这部分有用......
awk '{if ($0 ~ /<ErrorMessage>,*/) { printf "%s", $0; getline var; printf "%s\n", var} else {print $0}}' test.xml
<requestID>260</requestID>,
<ErrorMessage>The following is required: PersonName</ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>260</requestID></TCRMService>
但是现在我将ErrorMessage移到RequestID行的末尾时遇到了麻烦......
请注意,在ErrorMessage行中,requestID也位于同一行。我认为关键是要在
上寻找模式匹配 </requestID>,
答案 0 :(得分:0)
在awk中,非常QnD(假设只有空格,没有标签):
$ awk '{gsub(/^ +| +$|, *$/,"");printf "%s%s", ($0~/^ *<requestID>/?ORS:","), $0}END{print ""}' file
<requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
<requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
现在它只需要删除领先的换行符,但我需要赶上公共汽车(我可以得到一个交通工具,男人)。
答案 1 :(得分:0)
所以这部分有用......
awk '{if ($0 ~ /<ErrorMessage>,*/) { printf "%s", $0; getline var; printf "%s\n", var} else {print $0}}' test.xml
<requestID>260</requestID>,
<ErrorMessage>The following is required: PersonName</ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>260</requestID></TCRMService>
但是现在我将ErrorMessage移到RequestID行的末尾时遇到了麻烦......
请不要,在ErrorMessage行中,requestID也在同一行。
答案 2 :(得分:0)
试试这个 -
awk -v FS="" '{gsub(/^[[:space:]]+/,"",$0);ORS=(NR%3==0?RS:FS)}1' f
答案 3 :(得分:0)
为什么不是perl片段?随着波纹管被删除,删除了两个以上的空格。由于您在主要问题中建议的输入文件已经有相应的逗号,因此不会添加逗号。
$ cat file3 |nl
1 <requestID>224</requestID>,
2 <ErrorMessage>The following is required: PersonName </ErrorMessage>,
3 <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
4 <requestID>615</requestID>,
5 <ErrorMessage>The following is required: PersonName </ErrorMessage>,
6 <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
$ perl -pe 's/\n//g; s/[[:space:]]{2,}//g; s/<\/TCRMService>/$&\n/g' file3 |nl
1 <requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>
2 <requestID>615</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService>