我想使用unix从给定xml中的指定标记中提取值。我有一个未格式化的xml(单行中的所有数据),我需要搜索标签 PolNumber 。它在同一行中多次出现。
请在下面找到xml
<?xml version="1.0" encoding="UTF-8"?><TXLife><UserAuthRequest><UserLoginName>FirstPenn</UserLoginName><UserPswd><CryptType>None</CryptType><Pswd>None</Pswd></UserPswd><UserDate>2016-05-06</UserDate><UserTime>11:06</UserTime><VendorApp><VendorName VendorCode="FPPTB">FirstPenn</VendorName><AppName>ACORD XML Download</AppName><AppVer>1.0</AppVer></VendorApp></UserAuthRequest><TXLifeRequest><TransRefGUID>4B6BB6FB-6FA0-4678-A3A2-862E7AE7D884</TransRefGUID><TransType tc="1125"/><TransExeDate>2016-05-06</TransExeDate><TransExeTime>11:06</TransExeTime><TransMode tc="2"/><InquiryLevel tc="3"/><MaxRecords>0</MaxRecords><PendingResponseOK tc="0">False</PendingResponseOK><NoResponseOK tc="1">True</NoResponseOK><TestIndicator tc="0">False</TestIndicator><OLifE Version="2.7"><SourceInfo><CreationDate>2016-05-06</CreationDate><SourceInfoName>First Penn-Pacific</SourceInfoName><SourceInfoDescription>Pending Case Status</SourceInfoDescription><FileControlID>1223232304</FileControlID></SourceInfo></Holding><Holding id="HLD_4902160"><HoldingTypeCode tc="2"/><HoldingStatus tc="4"/><AsOfDate>2016-05-05</AsOfDate><Policy CarrierPartyID="LLCTB_4902160"><CarrierCode>LLCTB</CarrierCode><PolNumber>4902160</PolNumber><LineOfBusiness tc="1">Life</LineOfBusiness><ProductType tc="4"/><ProductCode>VLON14 </ProductCode><PlanName>VLON14 </PlanName><PolicyStatus tc="24">Approved, not issued</PolicyStatus><Jurisdiction tc="56"/><EffDate>2016-02-18</EffDate><PaymentMode tc="9">Single Payment</PaymentMode><PaymentAmt>62336.0000</PaymentAmt><Life><TargetPremAmt>5759.9700</TargetPremAmt><TotalRolloverAmt>0.0000</TotalRolloverAmt><FaceAmt>261579.0000</FaceAmt><Coverage id="COV_4902160_1"><IndicatorCode tc="1"/><LivesType tc="2147483647"/><LifeParticipant PartyID="INS_4902160_1"><LifeParticipantRoleCode tc="1"/><IssueAge>53</IssueAge><IssueGender tc="1"/><TobaccoPremiumBasis tc="1">Non Smoker</TobaccoPremiumBasis><PermTableRating tc="1"/><UnderwritingClass tc="2">Preferred risk</UnderwritingClass></LifeParticipant></Coverage></Life><Holding id="HLD_4902270"><HoldingTypeCode tc="2"/><HoldingStatus tc="4"/><AsOfDate>2016-05-06</AsOfDate><Policy CarrierPartyID="LLCTB_4902270"><CarrierCode>LLCTB</CarrierCode><PolNumber>4902270</PolNumber><LineOfBusiness tc="1">Life</LineOfBusiness><ProductType tc="4"/><ProductCode>VLON14 </ProductCode><PlanName>VLON14 </PlanName><PolicyStatus tc="8">Pending Issue</PolicyStatus><Jurisdiction tc="17"/><EffDate>2016-02-24</EffDate><PaymentMode tc="1">Annual</PaymentMode><PaymentAmt>2532.0000</PaymentAmt><Life><TargetPremAmt>7422.0000</TargetPremAmt><TotalRolloverAmt>0.0000</TotalRolloverAmt><FaceAmt>200000.0000</FaceAmt><Coverage id="COV_4902270_1"><IndicatorCode tc="1"/><LivesType tc="2147483647"/><LifeParticipant PartyID="INS_4902270_1"><LifeParticipantRoleCode tc="1"/><IssueAge>69</IssueAge><IssueGender tc="2"/><TobaccoPremiumBasis tc="1">Non Smoker</TobaccoPremiumBasis><PermTableRating tc="1"/><UnderwritingClass tc="1">Standard Risk</UnderwritingClass></LifeParticipant></Coverage></Life>
使用以下grep命令
按预期工作grep -oP "<PolNumber>[0-9]*</PolNumber>" samp.xml | grep -oe '\([0-9]*\)'
但它在网上unix编译网站工作,但同样在我的机器上工作。它说 Grep无效选项--o 。我不确定版本问题或其他什么,但我需要用我当前的unix修复它。你能帮我做那件事。
先谢谢你 马尼万南
答案 0 :(得分:1)
使用简单的实用程序:
tr "<" "\n" < samp.xml | grep "^PolNumber" | cut -d">" -f2
答案 1 :(得分:0)
使用非xml解析器(如grep,sed或其他任何东西)解析xml通常是一个坏主意。
无论如何,这是一个快速而又脏的解决方案:sed:
sed 's#\(<PolNumber>[0-9]*\)</PolNumber>#\1\n#g' samp.xml | grep '<PolNumber>' | sed 's#.*<PolNumber>\([0-9]*\)$#\1#'
仅当您的xml在一行中时才有效。