Map <string,string>中缺少XML数据

时间:2015-08-23 20:54:14

标签: java xml xpath

我已经使用XPath解析了一个具有重复数据的特定XML文件。我这样做的方法是扫描整个文档并获取每个值元素的xpath,然后我使用Javax Xpath库从该值元素的xpath中检索数据。

这是我的代码:

./node_modules/.bin/sl-run

两张地图的输出都缺少重复的数据。它只是抓取与唯一xpath相关的数据。 mapUniques用于获取唯一xpath的值,但multiOccurance映射用于获取具有多个值的xpath的数据。我必须以某种方式编辑具有多个出现的X路径,例如/ PayrollFormInfo / FormInfo [1] / ExportDataVersion / 100和/ PayrollFormInfo / FormInfo [2] / ExportDataVersion / 100,注意它是如何相同的X路径但是表示出现了数据。

我该怎么做?

XML文档:

public void test() throws Exception {

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = null;
    File f = new File("xmlDir/INWKS941_AllSchedB_IFSP.xml");

    builder = factory.newDocumentBuilder();
    String xml = FileUtils.readFileToString(f);
    Document xmlDocument = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    XPath xPath = XPathFactory.newInstance().newXPath();

    InputStream stream = new ByteArrayInputStream(xml.getBytes());
    List<String> xpaths = getXPaths(stream);

    Map<String, String> mapUniques = new LinkedHashMap<String, String>();
    Map<String, String> multiOccurance = new LinkedHashMap<String, String>();

    Integer count = 1;
    for(int i=0;i<xpaths.size();i++) {
        String xPathExpr = xpaths.get(i);
        count++;

        NodeList nodeList = (NodeList) xPath.compile(xPathExpr).evaluate(xmlDocument, XPathConstants.NODESET);

        Integer counter = 1;
        for(int j=0;j<nodeList.getLength();j++) {

            if(nodeList.getLength() <= 1) {
                mapUniques.put(xPathExpr, nodeList.item(j).getFirstChild().getNodeValue());
            }

            else if(nodeList.getLength() > 1) {
                multiOccurance.put(xPathExpr, nodeList.item(j).getFirstChild().getNodeValue());
            }

            counter++;
        }
    }

    logger.info("Here: "+Arrays.toString(multiOccurance.entrySet().toArray()));


    Iterator it = multiOccurance.entrySet().iterator();
    while(it.hasNext()) {
        Entry pair = (Entry) it.next();
        logger.info(pair.getKey()+"="+pair.getValue());
        it.remove();
    }

    Iterator iter = mapUniques.entrySet().iterator();
    while(iter.hasNext()) {
        Entry pair = (Entry) iter.next();
        logger.info(pair.getKey()+"="+pair.getValue());
        iter.remove();
    }
}

修改

这是我为值节点获取XPath的代码:

<PayrollFormInfo xmlns="http://www.irs.com/PayrollFormInfo/2004">
    <FormInfo>
        <ExportDataVersion>100</ExportDataVersion>
        <QBVersion>IFSP</QBVersion>
        <FormSetID>FSDYF</FormSetID>
        <FormID>INWKS941</FormID>
        <FormDesc>Quarterly Federal Tax Return</FormDesc>
        <FormFilingPeriod>Quarterly</FormFilingPeriod>
        <TotalsBreakoutBy>Daily</TotalsBreakoutBy>
        <BeginDate>01/01/2015</BeginDate>
        <EndDate>03/31/2015</EndDate>
        <EFYes>true</EFYes>
        <IncludeInstructionSheet>false</IncludeInstructionSheet>
    </FormInfo>
    <CompanyInfo>
        <CompanyName>PatsFedTaxSemiWeekly20144QRT_CT</CompanyName>
        <LegalName>P #a(9t)-'&amp;sFedTaxSemiWeekly2014_4QRTCT</LegalName>
        <EmployerID>507754170</EmployerID>
        <IntuitPayrollServiceID>1862684</IntuitPayrollServiceID>
        <LegalAddressLine1>Def-ault/ .String</LegalAddressLine1>
        <LegalCity>St. Louis</LegalCity>
        <LegalState>CB</LegalState>
        <LegalZip>06300-0987</LegalZip>
        <MailingAddressLine1>Default String Addr</MailingAddressLine1>
        <MailingCity>Greenwich</MailingCity>
        <MailingState>CT</MailingState>
        <MailingZip>06830</MailingZip>
        <NumberOfEmployees>2</NumberOfEmployees>
        <FWTDepositFrequency>SEMI_WEEKLY</FWTDepositFrequency>
        <ExemptFlag>false</ExemptFlag>
        <EmployerType>Contributing</EmployerType>
        <FUTAExemptFlag>false</FUTAExemptFlag>
        <FederalTotals>
            <PeriodBeginDate>01/01/2015</PeriodBeginDate>
            <PeriodEndDate>01/01/2015</PeriodEndDate>
            <GrossPay>0.00</GrossPay>
            <NetPay>0.00</NetPay>
            <SocialSecurityWages>15384.62</SocialSecurityWages>
            <SocialSecurityLiability>-10.10</SocialSecurityLiability>
            <MedicareWages>15384.62</MedicareWages>
            <MedicareLiability>-1.01</MedicareLiability>
            <FederalWages>13076.92</FederalWages>
            <FederalWithholding>-101.00</FederalWithholding>
            <TaxableFUTAWages>0.00</TaxableFUTAWages>
            <TotalFUTAWages>0.00</TotalFUTAWages>
            <FUTALiability>0.00</FUTALiability>
            <EIC>0.00</EIC>
            <SocialSecurityTips>1.01</SocialSecurityTips>
        </FederalTotals>
        <FederalTotals>
            <PeriodBeginDate>01/02/2015</PeriodBeginDate>
            <PeriodEndDate>01/02/2015</PeriodEndDate>
            <GrossPay>0.00</GrossPay>
            <NetPay>0.00</NetPay>
            <SocialSecurityWages>15384.62</SocialSecurityWages>
            <SocialSecurityLiability>-10.20</SocialSecurityLiability>
            <MedicareWages>15384.62</MedicareWages>
            <MedicareLiability>-1.02</MedicareLiability>
            <FederalWages>13076.92</FederalWages>
            <FederalWithholding>-102.00</FederalWithholding>
            <TaxableFUTAWages>0.00</TaxableFUTAWages>
            <TotalFUTAWages>0.00</TotalFUTAWages>
            <FUTALiability>0.00</FUTALiability>
            <EIC>0.00</EIC>
            <SocialSecurityTips>1.02</SocialSecurityTips>
        </FederalTotals>
        <TaxItemTotals>
            <PeriodBeginDate>01/01/2015</PeriodBeginDate>
            <PeriodEndDate>03/31/2015</PeriodEndDate>
            <TaxTableID>200</TaxTableID>
            <TotalWagesAndTips>0.00</TotalWagesAndTips>
            <TaxableWagesAndTips>164615.44</TaxableWagesAndTips>
            <TaxAmount>0.00</TaxAmount>
        </TaxItemTotals>
        <PaymentInfo>
            <PeriodBeginDate>01/14/2015</PeriodBeginDate>
            <PeriodEndDate>01/16/2015</PeriodEndDate>
            <TaxTableID>62</TaxTableID>
            <Amount>953.86</Amount>
            <PaymentDate>04/03/2015</PaymentDate>
        </PaymentInfo>
        <EmpsWorked>
            <TaxTableID>63</TaxTableID>
            <TestDate>3/12/2015</TestDate>
            <NumEmps>2</NumEmps>
        </EmpsWorked>
        <EmpSummary>
            <EmployeeID>4514230</EmployeeID>
            <FirstName>JillCT</FirstName>
            <MiddleInitial>S</MiddleInitial>
            <LastName>Taylor</LastName>
            <SSN>150-86-6794</SSN>
            <IsStatutory>false</IsStatutory>
            <HasPensionPlan>false</HasPensionPlan>
            <EmployeeType>REGULAR</EmployeeType>
        </EmpSummary>
        <EmpSummary>
            <EmployeeID>4514229</EmployeeID>
            <FirstName>JoseCT</FirstName>
            <MiddleInitial>S</MiddleInitial>
            <LastName>George</LastName>
            <SSN>792-93-6215</SSN>
            <IsStatutory>false</IsStatutory>
            <HasPensionPlan>false</HasPensionPlan>
            <EmployeeType>REGULAR</EmployeeType>
        </EmpSummary>
        <FederalReturnType>941</FederalReturnType>
    </CompanyInfo>
    <EmployeeInfo>
        <EmployeeType>REGULAR</EmployeeType>
        <EmployeeID>4514230</EmployeeID>
        <FirstName>JillCT</FirstName>
        <MiddleInitial>S</MiddleInitial>
        <LastName>Taylor</LastName>
        <SSN>150-86-6794</SSN>
        <HireDate>10/09/2009</HireDate>
        <IsStatutory>false</IsStatutory>
        <HasPensionPlan>false</HasPensionPlan>
        <NumberOfExemptions>0</NumberOfExemptions>
        <StateLived>CT</StateLived>
        <StateWorked>CT</StateWorked>
        <Gender>F</Gender>
        <AddressLine1>123 default St.</AddressLine1>
        <City>Greenwich</City>
        <State>CT</State>
        <ZipCode>06830</ZipCode>
        <FederalTotals>
            <PeriodBeginDate>01/01/2015</PeriodBeginDate>
            <PeriodEndDate>01/31/2015</PeriodEndDate>
            <GrossPay>0.00</GrossPay>
            <NetPay>0.00</NetPay>
            <SocialSecurityWages>49269.21</SocialSecurityWages>
            <SocialSecurityLiability>-3054.69</SocialSecurityLiability>
            <MedicareWages>50769.24</MedicareWages>
            <MedicareLiability>-736.16</MedicareLiability>
            <FederalWages>46153.84</FederalWages>
            <FederalWithholding>-14242.77</FederalWithholding>
            <TaxableFUTAWages>0.00</TaxableFUTAWages>
            <TotalFUTAWages>0.00</TotalFUTAWages>
            <FUTALiability>0.00</FUTALiability>
            <EIC>0.00</EIC>
            <SocialSecurityTips>0.00</SocialSecurityTips>
        </FederalTotals>
        <FederalTotals>
            <PeriodBeginDate>02/01/2015</PeriodBeginDate>
            <PeriodEndDate>02/28/2015</PeriodEndDate>
            <GrossPay>0.00</GrossPay>
            <NetPay>0.00</NetPay>
            <SocialSecurityWages>15384.62</SocialSecurityWages>
            <SocialSecurityLiability>-953.85</SocialSecurityLiability>
            <MedicareWages>35384.62</MedicareWages>
            <MedicareLiability>-513.08</MedicareLiability>
            <FederalWages>33076.92</FederalWages>
            <FederalWithholding>-10637.87</FederalWithholding>
            <TaxableFUTAWages>0.00</TaxableFUTAWages>
            <TotalFUTAWages>0.00</TotalFUTAWages>
            <FUTALiability>0.00</FUTALiability>
            <EIC>0.00</EIC>
            <SocialSecurityTips>0.00</SocialSecurityTips>
        </FederalTotals>
        <FederalTotals>
            <PeriodBeginDate>03/01/2015</PeriodBeginDate>
            <PeriodEndDate>03/31/2015</PeriodEndDate>
            <GrossPay>0.00</GrossPay>
            <NetPay>0.00</NetPay>
            <SocialSecurityWages>46153.86</SocialSecurityWages>
            <SocialSecurityLiability>-2861.54</SocialSecurityLiability>
            <MedicareWages>196153.86</MedicareWages>
            <MedicareLiability>-3654.22</MedicareLiability>
            <FederalWages>189230.76</FederalWages>
            <FederalWithholding>-68440.64</FederalWithholding>
            <TaxableFUTAWages>0.00</TaxableFUTAWages>
            <TotalFUTAWages>0.00</TotalFUTAWages>
            <FUTALiability>0.00</FUTALiability>
            <EIC>0.00</EIC>
            <SocialSecurityTips>0.00</SocialSecurityTips>
        </FederalTotals>
        <WorkedInfo>
            <TaxTableID>64</TaxTableID>
            <TestDate>01/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>62</TaxTableID>
            <TestDate>02/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>61</TaxTableID>
            <TestDate>01/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>61</TaxTableID>
            <TestDate>02/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>1</TaxTableID>
            <TestDate>02/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>1</TaxTableID>
            <TestDate>03/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>62</TaxTableID>
            <TestDate>03/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>63</TaxTableID>
            <TestDate>02/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>61</TaxTableID>
            <TestDate>03/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>64</TaxTableID>
            <TestDate>02/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>63</TaxTableID>
            <TestDate>03/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>62</TaxTableID>
            <TestDate>01/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>64</TaxTableID>
            <TestDate>03/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>63</TaxTableID>
            <TestDate>01/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>1</TaxTableID>
            <TestDate>01/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
    </EmployeeInfo>
    <EmployeeInfo>
        <EmployeeType>REGULAR</EmployeeType>
        <EmployeeID>4514229</EmployeeID>
        <FirstName>JoseCT</FirstName>
        <MiddleInitial>S</MiddleInitial>
        <LastName>George</LastName>
        <SSN>792-93-6215</SSN>
        <HireDate>10/09/2009</HireDate>
        <IsStatutory>false</IsStatutory>
        <HasPensionPlan>false</HasPensionPlan>
        <NumberOfExemptions>0</NumberOfExemptions>
        <StateLived>CT</StateLived>
        <StateWorked>CT</StateWorked>
        <Gender>M</Gender>
        <AddressLine1>123 default St.</AddressLine1>
        <City>Greenwich</City>
        <State>CT</State>
        <ZipCode>06830</ZipCode>
        <FederalTotals>
            <PeriodBeginDate>02/01/2015</PeriodBeginDate>
            <PeriodEndDate>02/28/2015</PeriodEndDate>
            <GrossPay>0.00</GrossPay>
            <NetPay>0.00</NetPay>
            <SocialSecurityWages>15384.62</SocialSecurityWages>
            <SocialSecurityLiability>-953.85</SocialSecurityLiability>
            <MedicareWages>35384.62</MedicareWages>
            <MedicareLiability>-513.08</MedicareLiability>
            <FederalWages>33076.92</FederalWages>
            <FederalWithholding>-10637.87</FederalWithholding>
            <TaxableFUTAWages>0.00</TaxableFUTAWages>
            <TotalFUTAWages>0.00</TotalFUTAWages>
            <FUTALiability>0.00</FUTALiability>
            <EIC>0.00</EIC>
            <SocialSecurityTips>0.00</SocialSecurityTips>
        </FederalTotals>
        <FederalTotals>
            <PeriodBeginDate>01/01/2015</PeriodBeginDate>
            <PeriodEndDate>01/31/2015</PeriodEndDate>
            <GrossPay>0.00</GrossPay>
            <NetPay>0.00</NetPay>
            <SocialSecurityWages>49269.21</SocialSecurityWages>
            <SocialSecurityLiability>-3054.69</SocialSecurityLiability>
            <MedicareWages>50769.24</MedicareWages>
            <MedicareLiability>-736.16</MedicareLiability>
            <FederalWages>46153.84</FederalWages>
            <FederalWithholding>-14242.77</FederalWithholding>
            <TaxableFUTAWages>0.00</TaxableFUTAWages>
            <TotalFUTAWages>0.00</TotalFUTAWages>
            <FUTALiability>0.00</FUTALiability>
            <EIC>0.00</EIC>
            <SocialSecurityTips>0.00</SocialSecurityTips>
        </FederalTotals>
        <FederalTotals>
            <PeriodBeginDate>03/01/2015</PeriodBeginDate>
            <PeriodEndDate>03/31/2015</PeriodEndDate>
            <GrossPay>0.00</GrossPay>
            <NetPay>0.00</NetPay>
            <SocialSecurityWages>46153.86</SocialSecurityWages>
            <SocialSecurityLiability>-2861.54</SocialSecurityLiability>
            <MedicareWages>196153.86</MedicareWages>
            <MedicareLiability>-3654.22</MedicareLiability>
            <FederalWages>189230.76</FederalWages>
            <FederalWithholding>-68440.64</FederalWithholding>
            <TaxableFUTAWages>0.00</TaxableFUTAWages>
            <TotalFUTAWages>0.00</TotalFUTAWages>
            <FUTALiability>0.00</FUTALiability>
            <EIC>0.00</EIC>
            <SocialSecurityTips>0.00</SocialSecurityTips>
        </FederalTotals>
        <WorkedInfo>
            <TaxTableID>63</TaxTableID>
            <TestDate>03/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>61</TaxTableID>
            <TestDate>03/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>61</TaxTableID>
            <TestDate>01/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>1</TaxTableID>
            <TestDate>01/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>62</TaxTableID>
            <TestDate>02/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>63</TaxTableID>
            <TestDate>02/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>1</TaxTableID>
            <TestDate>03/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>64</TaxTableID>
            <TestDate>02/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>64</TaxTableID>
            <TestDate>03/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>1</TaxTableID>
            <TestDate>02/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>63</TaxTableID>
            <TestDate>01/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>62</TaxTableID>
            <TestDate>01/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>64</TaxTableID>
            <TestDate>01/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>62</TaxTableID>
            <TestDate>03/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
        <WorkedInfo>
            <TaxTableID>61</TaxTableID>
            <TestDate>02/12/2015</TestDate>
            <Worked>true</Worked>
        </WorkedInfo>
    </EmployeeInfo>
    <PaidPreparerInfo>
        <Signature>data:image</Signature>
        <Title>Agent in Fact</Title>
        <Date>03/15/2015</Date>
        <PrintedName>Coreen Solano</PrintedName>
        <PhoneNumber>888-927-7478</PhoneNumber>
        <FaxNumber>775-562-2657</FaxNumber>
        <FEIN>88-0146711</FEIN>
        <Address1>6884 Sierra Center Pkwy</Address1>
        <City>Reno</City>
        <State>NV</State>
        <Zip>89511</Zip>
        <FirmName>Computing Resources Inc</FirmName>
        <EmailAddress>tax_eservice@irs.com</EmailAddress>
    </PaidPreparerInfo>
</PayrollFormInfo>

这是我的自定义XMLUtils类:

public List<String> getXPaths ( InputStream stream ) throws ParserException {

    Document document = XMLUtils.getDocument(  stream );

    return getXPaths( document.getDocumentElement() );
}



public List<String> getXPaths ( Node node ) {

    List<String> xpaths = iterate( node, "");

    return xpaths;
}


public List<String> iterate ( Node node, String parentPath )  {

    List<String> xpaths = new ArrayList<String>();

    if ( node.getNodeType() == Node.ELEMENT_NODE ) {

        Element element = ( Element ) node;
        parentPath = parentPath + "/" +  element.getTagName();

        for ( int nIndex = 0; nIndex<node.getChildNodes().getLength(); nIndex++ ) {
            xpaths.addAll( iterate(node.getChildNodes().item(nIndex) , parentPath ) ) ;
        }
    }
    else if ( node.getNodeType() == Node.TEXT_NODE  ) {
        if (  node.getTextContent().trim().length() !=0 ) {
            logger.debug("XPath found : " + parentPath );
            xpaths.add( parentPath );
        }
    }
    else {
        logger.debug("Unknown node type for : " + node.getNodeName());
    }
    return xpaths;
}

1 个答案:

答案 0 :(得分:2)

每次通过内部循环都会覆盖multiOccurrance映射中的值。我不确定你想如何收集旧值和新值,但这里有一个选项:

for (int j = 0; j < nodeList.getLength(); j++) {
    if (nodeList.getLength() <= 1) {
        mapUniques.put(xPathExpr, nodeList.item(j).getFirstChild().getNodeValue());
    } else {
        String old = multiOccurance.get(xPathExpr);
        if (old == null) {
            old = "";
        }
        multiOccurance.put(xPathExpr,
                old + " "
                + nodeList.item(j).getFirstChild().getNodeValue());
    }
    counter++;
}

另一种方法是尝试使密钥唯一,如果密钥是唯一的,则您不必附加旧值:

for (int j = 0; j < nodeList.getLength(); j++) {
    if (nodeList.getLength() <= 1) {
        mapUniques.put(xPathExpr, nodeList.item(j).getFirstChild().getNodeValue());
    } else {
        String nodePath = "";
        Node n = nodeList.item(j);
        while(n  != null) {
            int nodenum = 0;
            Node sib = n.getPreviousSibling();
            while(sib != null) {
                nodenum++;
                sib = sib.getPreviousSibling();
            }
            nodePath = n.getNodeName()+"["+nodenum+"]/"+nodePath;
            n = n.getParentNode();
        }
        multiOccurance.put(nodePath,
                 nodeList.item(j).getFirstChild().getNodeValue());
    }
    counter++;
}