我已经使用XPath解析了一个具有重复数据的特定XML文件。我这样做的方法是扫描整个文档并获取每个值元素的xpath,然后我使用Javax Xpath库从该值元素的xpath中检索数据。
这是我的代码:
./node_modules/.bin/sl-run
两张地图的输出都缺少重复的数据。它只是抓取与唯一xpath相关的数据。 mapUniques用于获取唯一xpath的值,但multiOccurance映射用于获取具有多个值的xpath的数据。我必须以某种方式编辑具有多个出现的X路径,例如/ PayrollFormInfo / FormInfo [1] / ExportDataVersion / 100和/ PayrollFormInfo / FormInfo [2] / ExportDataVersion / 100,注意它是如何相同的X路径但是表示出现了数据。
我该怎么做?
XML文档:
public void test() throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
File f = new File("xmlDir/INWKS941_AllSchedB_IFSP.xml");
builder = factory.newDocumentBuilder();
String xml = FileUtils.readFileToString(f);
Document xmlDocument = builder.parse(new ByteArrayInputStream(xml.getBytes()));
XPath xPath = XPathFactory.newInstance().newXPath();
InputStream stream = new ByteArrayInputStream(xml.getBytes());
List<String> xpaths = getXPaths(stream);
Map<String, String> mapUniques = new LinkedHashMap<String, String>();
Map<String, String> multiOccurance = new LinkedHashMap<String, String>();
Integer count = 1;
for(int i=0;i<xpaths.size();i++) {
String xPathExpr = xpaths.get(i);
count++;
NodeList nodeList = (NodeList) xPath.compile(xPathExpr).evaluate(xmlDocument, XPathConstants.NODESET);
Integer counter = 1;
for(int j=0;j<nodeList.getLength();j++) {
if(nodeList.getLength() <= 1) {
mapUniques.put(xPathExpr, nodeList.item(j).getFirstChild().getNodeValue());
}
else if(nodeList.getLength() > 1) {
multiOccurance.put(xPathExpr, nodeList.item(j).getFirstChild().getNodeValue());
}
counter++;
}
}
logger.info("Here: "+Arrays.toString(multiOccurance.entrySet().toArray()));
Iterator it = multiOccurance.entrySet().iterator();
while(it.hasNext()) {
Entry pair = (Entry) it.next();
logger.info(pair.getKey()+"="+pair.getValue());
it.remove();
}
Iterator iter = mapUniques.entrySet().iterator();
while(iter.hasNext()) {
Entry pair = (Entry) iter.next();
logger.info(pair.getKey()+"="+pair.getValue());
iter.remove();
}
}
修改
这是我为值节点获取XPath的代码:
<PayrollFormInfo xmlns="http://www.irs.com/PayrollFormInfo/2004">
<FormInfo>
<ExportDataVersion>100</ExportDataVersion>
<QBVersion>IFSP</QBVersion>
<FormSetID>FSDYF</FormSetID>
<FormID>INWKS941</FormID>
<FormDesc>Quarterly Federal Tax Return</FormDesc>
<FormFilingPeriod>Quarterly</FormFilingPeriod>
<TotalsBreakoutBy>Daily</TotalsBreakoutBy>
<BeginDate>01/01/2015</BeginDate>
<EndDate>03/31/2015</EndDate>
<EFYes>true</EFYes>
<IncludeInstructionSheet>false</IncludeInstructionSheet>
</FormInfo>
<CompanyInfo>
<CompanyName>PatsFedTaxSemiWeekly20144QRT_CT</CompanyName>
<LegalName>P #a(9t)-'&sFedTaxSemiWeekly2014_4QRTCT</LegalName>
<EmployerID>507754170</EmployerID>
<IntuitPayrollServiceID>1862684</IntuitPayrollServiceID>
<LegalAddressLine1>Def-ault/ .String</LegalAddressLine1>
<LegalCity>St. Louis</LegalCity>
<LegalState>CB</LegalState>
<LegalZip>06300-0987</LegalZip>
<MailingAddressLine1>Default String Addr</MailingAddressLine1>
<MailingCity>Greenwich</MailingCity>
<MailingState>CT</MailingState>
<MailingZip>06830</MailingZip>
<NumberOfEmployees>2</NumberOfEmployees>
<FWTDepositFrequency>SEMI_WEEKLY</FWTDepositFrequency>
<ExemptFlag>false</ExemptFlag>
<EmployerType>Contributing</EmployerType>
<FUTAExemptFlag>false</FUTAExemptFlag>
<FederalTotals>
<PeriodBeginDate>01/01/2015</PeriodBeginDate>
<PeriodEndDate>01/01/2015</PeriodEndDate>
<GrossPay>0.00</GrossPay>
<NetPay>0.00</NetPay>
<SocialSecurityWages>15384.62</SocialSecurityWages>
<SocialSecurityLiability>-10.10</SocialSecurityLiability>
<MedicareWages>15384.62</MedicareWages>
<MedicareLiability>-1.01</MedicareLiability>
<FederalWages>13076.92</FederalWages>
<FederalWithholding>-101.00</FederalWithholding>
<TaxableFUTAWages>0.00</TaxableFUTAWages>
<TotalFUTAWages>0.00</TotalFUTAWages>
<FUTALiability>0.00</FUTALiability>
<EIC>0.00</EIC>
<SocialSecurityTips>1.01</SocialSecurityTips>
</FederalTotals>
<FederalTotals>
<PeriodBeginDate>01/02/2015</PeriodBeginDate>
<PeriodEndDate>01/02/2015</PeriodEndDate>
<GrossPay>0.00</GrossPay>
<NetPay>0.00</NetPay>
<SocialSecurityWages>15384.62</SocialSecurityWages>
<SocialSecurityLiability>-10.20</SocialSecurityLiability>
<MedicareWages>15384.62</MedicareWages>
<MedicareLiability>-1.02</MedicareLiability>
<FederalWages>13076.92</FederalWages>
<FederalWithholding>-102.00</FederalWithholding>
<TaxableFUTAWages>0.00</TaxableFUTAWages>
<TotalFUTAWages>0.00</TotalFUTAWages>
<FUTALiability>0.00</FUTALiability>
<EIC>0.00</EIC>
<SocialSecurityTips>1.02</SocialSecurityTips>
</FederalTotals>
<TaxItemTotals>
<PeriodBeginDate>01/01/2015</PeriodBeginDate>
<PeriodEndDate>03/31/2015</PeriodEndDate>
<TaxTableID>200</TaxTableID>
<TotalWagesAndTips>0.00</TotalWagesAndTips>
<TaxableWagesAndTips>164615.44</TaxableWagesAndTips>
<TaxAmount>0.00</TaxAmount>
</TaxItemTotals>
<PaymentInfo>
<PeriodBeginDate>01/14/2015</PeriodBeginDate>
<PeriodEndDate>01/16/2015</PeriodEndDate>
<TaxTableID>62</TaxTableID>
<Amount>953.86</Amount>
<PaymentDate>04/03/2015</PaymentDate>
</PaymentInfo>
<EmpsWorked>
<TaxTableID>63</TaxTableID>
<TestDate>3/12/2015</TestDate>
<NumEmps>2</NumEmps>
</EmpsWorked>
<EmpSummary>
<EmployeeID>4514230</EmployeeID>
<FirstName>JillCT</FirstName>
<MiddleInitial>S</MiddleInitial>
<LastName>Taylor</LastName>
<SSN>150-86-6794</SSN>
<IsStatutory>false</IsStatutory>
<HasPensionPlan>false</HasPensionPlan>
<EmployeeType>REGULAR</EmployeeType>
</EmpSummary>
<EmpSummary>
<EmployeeID>4514229</EmployeeID>
<FirstName>JoseCT</FirstName>
<MiddleInitial>S</MiddleInitial>
<LastName>George</LastName>
<SSN>792-93-6215</SSN>
<IsStatutory>false</IsStatutory>
<HasPensionPlan>false</HasPensionPlan>
<EmployeeType>REGULAR</EmployeeType>
</EmpSummary>
<FederalReturnType>941</FederalReturnType>
</CompanyInfo>
<EmployeeInfo>
<EmployeeType>REGULAR</EmployeeType>
<EmployeeID>4514230</EmployeeID>
<FirstName>JillCT</FirstName>
<MiddleInitial>S</MiddleInitial>
<LastName>Taylor</LastName>
<SSN>150-86-6794</SSN>
<HireDate>10/09/2009</HireDate>
<IsStatutory>false</IsStatutory>
<HasPensionPlan>false</HasPensionPlan>
<NumberOfExemptions>0</NumberOfExemptions>
<StateLived>CT</StateLived>
<StateWorked>CT</StateWorked>
<Gender>F</Gender>
<AddressLine1>123 default St.</AddressLine1>
<City>Greenwich</City>
<State>CT</State>
<ZipCode>06830</ZipCode>
<FederalTotals>
<PeriodBeginDate>01/01/2015</PeriodBeginDate>
<PeriodEndDate>01/31/2015</PeriodEndDate>
<GrossPay>0.00</GrossPay>
<NetPay>0.00</NetPay>
<SocialSecurityWages>49269.21</SocialSecurityWages>
<SocialSecurityLiability>-3054.69</SocialSecurityLiability>
<MedicareWages>50769.24</MedicareWages>
<MedicareLiability>-736.16</MedicareLiability>
<FederalWages>46153.84</FederalWages>
<FederalWithholding>-14242.77</FederalWithholding>
<TaxableFUTAWages>0.00</TaxableFUTAWages>
<TotalFUTAWages>0.00</TotalFUTAWages>
<FUTALiability>0.00</FUTALiability>
<EIC>0.00</EIC>
<SocialSecurityTips>0.00</SocialSecurityTips>
</FederalTotals>
<FederalTotals>
<PeriodBeginDate>02/01/2015</PeriodBeginDate>
<PeriodEndDate>02/28/2015</PeriodEndDate>
<GrossPay>0.00</GrossPay>
<NetPay>0.00</NetPay>
<SocialSecurityWages>15384.62</SocialSecurityWages>
<SocialSecurityLiability>-953.85</SocialSecurityLiability>
<MedicareWages>35384.62</MedicareWages>
<MedicareLiability>-513.08</MedicareLiability>
<FederalWages>33076.92</FederalWages>
<FederalWithholding>-10637.87</FederalWithholding>
<TaxableFUTAWages>0.00</TaxableFUTAWages>
<TotalFUTAWages>0.00</TotalFUTAWages>
<FUTALiability>0.00</FUTALiability>
<EIC>0.00</EIC>
<SocialSecurityTips>0.00</SocialSecurityTips>
</FederalTotals>
<FederalTotals>
<PeriodBeginDate>03/01/2015</PeriodBeginDate>
<PeriodEndDate>03/31/2015</PeriodEndDate>
<GrossPay>0.00</GrossPay>
<NetPay>0.00</NetPay>
<SocialSecurityWages>46153.86</SocialSecurityWages>
<SocialSecurityLiability>-2861.54</SocialSecurityLiability>
<MedicareWages>196153.86</MedicareWages>
<MedicareLiability>-3654.22</MedicareLiability>
<FederalWages>189230.76</FederalWages>
<FederalWithholding>-68440.64</FederalWithholding>
<TaxableFUTAWages>0.00</TaxableFUTAWages>
<TotalFUTAWages>0.00</TotalFUTAWages>
<FUTALiability>0.00</FUTALiability>
<EIC>0.00</EIC>
<SocialSecurityTips>0.00</SocialSecurityTips>
</FederalTotals>
<WorkedInfo>
<TaxTableID>64</TaxTableID>
<TestDate>01/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>62</TaxTableID>
<TestDate>02/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>61</TaxTableID>
<TestDate>01/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>61</TaxTableID>
<TestDate>02/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>1</TaxTableID>
<TestDate>02/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>1</TaxTableID>
<TestDate>03/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>62</TaxTableID>
<TestDate>03/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>63</TaxTableID>
<TestDate>02/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>61</TaxTableID>
<TestDate>03/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>64</TaxTableID>
<TestDate>02/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>63</TaxTableID>
<TestDate>03/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>62</TaxTableID>
<TestDate>01/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>64</TaxTableID>
<TestDate>03/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>63</TaxTableID>
<TestDate>01/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>1</TaxTableID>
<TestDate>01/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
</EmployeeInfo>
<EmployeeInfo>
<EmployeeType>REGULAR</EmployeeType>
<EmployeeID>4514229</EmployeeID>
<FirstName>JoseCT</FirstName>
<MiddleInitial>S</MiddleInitial>
<LastName>George</LastName>
<SSN>792-93-6215</SSN>
<HireDate>10/09/2009</HireDate>
<IsStatutory>false</IsStatutory>
<HasPensionPlan>false</HasPensionPlan>
<NumberOfExemptions>0</NumberOfExemptions>
<StateLived>CT</StateLived>
<StateWorked>CT</StateWorked>
<Gender>M</Gender>
<AddressLine1>123 default St.</AddressLine1>
<City>Greenwich</City>
<State>CT</State>
<ZipCode>06830</ZipCode>
<FederalTotals>
<PeriodBeginDate>02/01/2015</PeriodBeginDate>
<PeriodEndDate>02/28/2015</PeriodEndDate>
<GrossPay>0.00</GrossPay>
<NetPay>0.00</NetPay>
<SocialSecurityWages>15384.62</SocialSecurityWages>
<SocialSecurityLiability>-953.85</SocialSecurityLiability>
<MedicareWages>35384.62</MedicareWages>
<MedicareLiability>-513.08</MedicareLiability>
<FederalWages>33076.92</FederalWages>
<FederalWithholding>-10637.87</FederalWithholding>
<TaxableFUTAWages>0.00</TaxableFUTAWages>
<TotalFUTAWages>0.00</TotalFUTAWages>
<FUTALiability>0.00</FUTALiability>
<EIC>0.00</EIC>
<SocialSecurityTips>0.00</SocialSecurityTips>
</FederalTotals>
<FederalTotals>
<PeriodBeginDate>01/01/2015</PeriodBeginDate>
<PeriodEndDate>01/31/2015</PeriodEndDate>
<GrossPay>0.00</GrossPay>
<NetPay>0.00</NetPay>
<SocialSecurityWages>49269.21</SocialSecurityWages>
<SocialSecurityLiability>-3054.69</SocialSecurityLiability>
<MedicareWages>50769.24</MedicareWages>
<MedicareLiability>-736.16</MedicareLiability>
<FederalWages>46153.84</FederalWages>
<FederalWithholding>-14242.77</FederalWithholding>
<TaxableFUTAWages>0.00</TaxableFUTAWages>
<TotalFUTAWages>0.00</TotalFUTAWages>
<FUTALiability>0.00</FUTALiability>
<EIC>0.00</EIC>
<SocialSecurityTips>0.00</SocialSecurityTips>
</FederalTotals>
<FederalTotals>
<PeriodBeginDate>03/01/2015</PeriodBeginDate>
<PeriodEndDate>03/31/2015</PeriodEndDate>
<GrossPay>0.00</GrossPay>
<NetPay>0.00</NetPay>
<SocialSecurityWages>46153.86</SocialSecurityWages>
<SocialSecurityLiability>-2861.54</SocialSecurityLiability>
<MedicareWages>196153.86</MedicareWages>
<MedicareLiability>-3654.22</MedicareLiability>
<FederalWages>189230.76</FederalWages>
<FederalWithholding>-68440.64</FederalWithholding>
<TaxableFUTAWages>0.00</TaxableFUTAWages>
<TotalFUTAWages>0.00</TotalFUTAWages>
<FUTALiability>0.00</FUTALiability>
<EIC>0.00</EIC>
<SocialSecurityTips>0.00</SocialSecurityTips>
</FederalTotals>
<WorkedInfo>
<TaxTableID>63</TaxTableID>
<TestDate>03/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>61</TaxTableID>
<TestDate>03/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>61</TaxTableID>
<TestDate>01/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>1</TaxTableID>
<TestDate>01/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>62</TaxTableID>
<TestDate>02/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>63</TaxTableID>
<TestDate>02/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>1</TaxTableID>
<TestDate>03/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>64</TaxTableID>
<TestDate>02/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>64</TaxTableID>
<TestDate>03/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>1</TaxTableID>
<TestDate>02/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>63</TaxTableID>
<TestDate>01/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>62</TaxTableID>
<TestDate>01/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>64</TaxTableID>
<TestDate>01/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>62</TaxTableID>
<TestDate>03/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
<WorkedInfo>
<TaxTableID>61</TaxTableID>
<TestDate>02/12/2015</TestDate>
<Worked>true</Worked>
</WorkedInfo>
</EmployeeInfo>
<PaidPreparerInfo>
<Signature>data:image</Signature>
<Title>Agent in Fact</Title>
<Date>03/15/2015</Date>
<PrintedName>Coreen Solano</PrintedName>
<PhoneNumber>888-927-7478</PhoneNumber>
<FaxNumber>775-562-2657</FaxNumber>
<FEIN>88-0146711</FEIN>
<Address1>6884 Sierra Center Pkwy</Address1>
<City>Reno</City>
<State>NV</State>
<Zip>89511</Zip>
<FirmName>Computing Resources Inc</FirmName>
<EmailAddress>tax_eservice@irs.com</EmailAddress>
</PaidPreparerInfo>
</PayrollFormInfo>
这是我的自定义XMLUtils类:
public List<String> getXPaths ( InputStream stream ) throws ParserException {
Document document = XMLUtils.getDocument( stream );
return getXPaths( document.getDocumentElement() );
}
public List<String> getXPaths ( Node node ) {
List<String> xpaths = iterate( node, "");
return xpaths;
}
public List<String> iterate ( Node node, String parentPath ) {
List<String> xpaths = new ArrayList<String>();
if ( node.getNodeType() == Node.ELEMENT_NODE ) {
Element element = ( Element ) node;
parentPath = parentPath + "/" + element.getTagName();
for ( int nIndex = 0; nIndex<node.getChildNodes().getLength(); nIndex++ ) {
xpaths.addAll( iterate(node.getChildNodes().item(nIndex) , parentPath ) ) ;
}
}
else if ( node.getNodeType() == Node.TEXT_NODE ) {
if ( node.getTextContent().trim().length() !=0 ) {
logger.debug("XPath found : " + parentPath );
xpaths.add( parentPath );
}
}
else {
logger.debug("Unknown node type for : " + node.getNodeName());
}
return xpaths;
}
答案 0 :(得分:2)
每次通过内部循环都会覆盖multiOccurrance映射中的值。我不确定你想如何收集旧值和新值,但这里有一个选项:
for (int j = 0; j < nodeList.getLength(); j++) {
if (nodeList.getLength() <= 1) {
mapUniques.put(xPathExpr, nodeList.item(j).getFirstChild().getNodeValue());
} else {
String old = multiOccurance.get(xPathExpr);
if (old == null) {
old = "";
}
multiOccurance.put(xPathExpr,
old + " "
+ nodeList.item(j).getFirstChild().getNodeValue());
}
counter++;
}
另一种方法是尝试使密钥唯一,如果密钥是唯一的,则您不必附加旧值:
for (int j = 0; j < nodeList.getLength(); j++) {
if (nodeList.getLength() <= 1) {
mapUniques.put(xPathExpr, nodeList.item(j).getFirstChild().getNodeValue());
} else {
String nodePath = "";
Node n = nodeList.item(j);
while(n != null) {
int nodenum = 0;
Node sib = n.getPreviousSibling();
while(sib != null) {
nodenum++;
sib = sib.getPreviousSibling();
}
nodePath = n.getNodeName()+"["+nodenum+"]/"+nodePath;
n = n.getParentNode();
}
multiOccurance.put(nodePath,
nodeList.item(j).getFirstChild().getNodeValue());
}
counter++;
}