如何从XML中获取文本而忽略其他子元素文本

时间:2015-10-21 17:08:59

标签: java xml xpath

我有一段XML如下:

示例1:

<explanation>
<NodeExplanations>
    <IDAfterSkipProcessing>/Return/ReturnData/PPStudentLoanInterestWks/StudentLoanInterestDeductionAmtPP</IDAfterSkipProcessing>
    <NodeExplanation>
        <ID>/Return/ReturnData/PPStudentLoanInterestWks/StudentLoanInterestDeductionAmtPP</ID>
        <SkippedToIDForExplanationData>/Return/ReturnData/PPStudentLoanInterestWks/StudentLoanInterestDeductionAmtPP</SkippedToIDForExplanationData>
        <Value>2000</Value>
        <Gist>Difference</Gist>
        <Scenario>DIFFERENCE</Scenario>
        <Title>StudentLoanInterestDeductionAmtPP</Title>
        <Phrase>
            <Text>StudentLoanInterestDeductionAmtPP</Text>
        </Phrase>
        <Question>
            <Text>StudentLoanInterestDeductionAmtPP</Text>
        </Question>
        <ExplanationText>
            <NodeName>
                <Text>StudentLoanInterestDeductionAmtPP</Text>
            </NodeName>
            <Text> comes from subtracting </Text>
            <InputName>
                <Text>MultiplyLine2byLine6AmtPP</Text>
            </InputName>
            <Text> from </Text>
            <InputName>
                <Text>SmallerOfLine1OrLimitAmtPP</Text>
            </InputName>
            <Text>.</Text>
            <BulletedList>
                <ListEntry>
                    <InputLink>
                        <Ref>/Return/ReturnData/PPStudentLoanInterestWks/SmallerOfLine1OrLimitAmtPP</Ref>
                        <LinkText>
                            <Text>SmallerOfLine1OrLimitAmtPP</Text>
                        </LinkText>
                    </InputLink>
                </ListEntry>
                <ListEntry>
                    <InputLink>
                        <Ref>/Return/ReturnData/PPStudentLoanInterestWks/MultiplyLine2byLine6AmtPP</Ref>
                        <LinkText>
                            <Text>MultiplyLine2byLine6AmtPP</Text>
                        </LinkText>
                    </InputLink>
                </ListEntry>
            </BulletedList>
        </ExplanationText>
        <InputNodes>
            <InputNodeEntry>
                <ID>/Return/ReturnData/PPStudentLoanInterestWks/QualifiedStudentLoanPP</ID>
                <Role>BlankIfFalse</Role>
                <Value>true</Value>
                <Type>CALCULATED_NODE</Type>
                <HasSubExplanations>false</HasSubExplanations>
            </InputNodeEntry>
            <InputNodeEntry>
                <ID>/Return/ReturnData/PPStudentLoanInterestWks/MultiplyLine2byLine6AmtPP</ID>
                <Role>Right</Role>
                <Value>0</Value>
                <Type>CALCULATED_NODE</Type>
                <HasSubExplanations>false</HasSubExplanations>
            </InputNodeEntry>
            <InputNodeEntry>
                <ID>/Return/ReturnData/PPStudentLoanInterestWks/SmallerOfLine1OrLimitAmtPP</ID>
                <Role>Left</Role>
                <Value>2000</Value>
                <Type>CALCULATED_NODE</Type>
                <HasSubExplanations>false</HasSubExplanations>
            </InputNodeEntry>
        </InputNodes>
        <Children>
            <ID>/Return/ReturnData/PPStudentLoanInterestWks/QualifiedStudentLoanPP</ID>
            <ID>/Return/ReturnData/PPStudentLoanInterestWks/MultiplyLine2byLine6AmtPP</ID>
            <ID>/Return/ReturnData/PPStudentLoanInterestWks/SmallerOfLine1OrLimitAmtPP</ID>
        </Children>
    </NodeExplanation>
</NodeExplanations>
</explanation>

示例2:

<ExplanationText>
    <Text>We can't get any more details on </Text>
    <NodeName>
        <Text>QualifiedStudentLoansInterestAmtPP</Text>
    </NodeName>
    <Text> right now.</Text>
</ExplanationText>

示例3:

<ExplanationText>
    <Text>We can't get any more details on </Text>
    <NodeValue>
        <Value>123</Value>
    </NodeName>
    <Text> right now.</Text>
</ExplanationText>

示例4:

<ExplanationText>
    <NodeName>
        <Text>Your Earned Income Credit of </Text>
        <NodeValue>
            <Currency>156</Currency>
        </NodeValue>
    </NodeName>
    <Text> comes from </Text>
    <InputValue>
        <Currency>156</Currency>
    </InputValue>
    <Text>.</Text>
</ExplanationText>

我想从这些XML中的所有Text和Value标签中获取所有文本,但我想忽略BulletedList标记下的所有内容。我想要其他一切。我怎样才能在Java中实现这一目标?

这是我目前的实施:

public static String getExplanationTextFromResponse(String pathToExpFile, String nodeID) {

    File f = new File(pathToExpFile);

    String text = new String();
    Map<String,String> listOfExpText = getExplanationText(f);

    text = listOfExpText.get(nodeID);
    return text;
}


    public static Map<String,String> getExplanationText(File pathToExpFile) {

        Map<String,String> map = new LinkedHashMap<String, String>();

        String nodeExplanationXPATH = "/explanation/NodeExplanations/NodeExplanation";
        String explanationTextXPATH = "ExplanationText//Text/text()";

        String id = new String();
        String text = new String();

        try {
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            String xml = FileUtils.readFileToString(pathToExpFile);
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document doc = builder.parse(new InputSource(new StringReader(xml)));


            NodeList nlNodeExplanationList = doc.getElementsByTagName("NodeExplanation"); 
            for(int i=0;i<nlNodeExplanationList.getLength();i++) {
                Node explanationNode = nlNodeExplanationList.item(i);
                List<String> idList = getTextValuesByTagName((Element)explanationNode, "ID");
                id = idList.get(0);
            }

            XPath xpath = XPathFactory.newInstance().newXPath();
            Object object = new Object();
            object = xpath.evaluate(nodeExplanationXPATH, doc, XPathConstants.NODESET);

            NodeList explanations = (NodeList) object;

            int count = explanations.getLength();
            for(int i=0;i<count;i++) {
                Object obj = new Object();
                Node explanation = explanations.item(i);
                obj = xpath.evaluate(explanationTextXPATH, explanation, XPathConstants.NODESET);
                NodeList explanationText = (NodeList) obj;
                text = (joinNodeSetText(explanationText));
                logger.debug(joinNodeSetText(explanationText));
            }

            map.put(id, text);

            return map;
        }
        catch (IOException e) {
            e.printStackTrace();
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (XPathExpressionException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        }
        return null;
    }


    /**
     * Joins the .getTextContent() of each node in the nodeSet concatenated into one string.
     * 
     * @param nodeSet
     * @return 
     */
    static String joinNodeSetText(NodeList nodeSet) {
        StringBuilder builder = new StringBuilder();
        logger.debug("Combining Text nodes...");
        for(int i=0;i<nodeSet.getLength();i++) {
            builder.append(nodeSet.item(i).getTextContent());
        }
        logger.debug("Combination complete!");
        return builder.toString();
    }

修改

当我使用'// Text [not(ancestor :: BulletedList)|)时,我得到了以下的结果: //值[not(ancestor :: BulletedList)]':

javax.xml.transform.TransformerException: Expected ], but found: 
    at org.apache.xpath.compiler.XPathParser.error(XPathParser.java:610)
    at org.apache.xpath.compiler.XPathParser.consumeExpected(XPathParser.java:528)
    at org.apache.xpath.compiler.XPathParser.Predicate(XPathParser.java:1937)
    at org.apache.xpath.compiler.XPathParser.Step(XPathParser.java:1726)
    at org.apache.xpath.compiler.XPathParser.RelativeLocationPath(XPathParser.java:1626)
    at org.apache.xpath.compiler.XPathParser.LocationPath(XPathParser.java:1597)
    at org.apache.xpath.compiler.XPathParser.PathExpr(XPathParser.java:1317)
    at org.apache.xpath.compiler.XPathParser.UnionExpr(XPathParser.java:1236)
    at org.apache.xpath.compiler.XPathParser.UnaryExpr(XPathParser.java:1142)
    at org.apache.xpath.compiler.XPathParser.MultiplicativeExpr(XPathParser.java:1063)
    at org.apache.xpath.compiler.XPathParser.AdditiveExpr(XPathParser.java:1005)
    at org.apache.xpath.compiler.XPathParser.RelationalExpr(XPathParser.java:930)
    at org.apache.xpath.compiler.XPathParser.EqualityExpr(XPathParser.java:870)
    at org.apache.xpath.compiler.XPathParser.AndExpr(XPathParser.java:834)
    at org.apache.xpath.compiler.XPathParser.OrExpr(XPathParser.java:807)
    at org.apache.xpath.compiler.XPathParser.Expr(XPathParser.java:790)
    at org.apache.xpath.compiler.XPathParser.initXPath(XPathParser.java:129)
    at org.apache.xpath.XPath.<init>(XPath.java:178)
    at org.apache.xpath.XPath.<init>(XPath.java:266)
    at org.apache.xpath.jaxp.XPathImpl.eval(XPathImpl.java:195)
    at org.apache.xpath.jaxp.XPathImpl.evaluate(XPathImpl.java:281)
    at com.generalatomics.ctg.engine.automation.tools.contentutils.calc.ExplanationResponseReader.getExplanationText(ExplanationResponseReader.java:398)
    at com.generalatomics.ctg.engine.automation.tools.contentutils.calc.ExplanationResponseReader.getExplanationTextFromResponse(ExplanationResponseReader.java:349)
    at com.generalatomics.ctg.engine.automation.tools.contentutils.calc.ExplanationResponseReader.t(ExplanationResponseReader.java:338)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
    at org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
    at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
    at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
    at org.testng.TestRunner.privateRun(TestRunner.java:767)
    at org.testng.TestRunner.run(TestRunner.java:617)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
    at org.testng.SuiteRunner.run(SuiteRunner.java:240)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1149)
    at org.testng.TestNG.run(TestNG.java:1057)
    at org.testng.remote.RemoteTestNG.run(RemoteTestNG.java:111)
    at org.testng.remote.RemoteTestNG.initAndRun(RemoteTestNG.java:204)
    at org.testng.remote.RemoteTestNG.main(RemoteTestNG.java:175)

1 个答案:

答案 0 :(得分:3)

如果您使用XPath表达式//Text[not(ancestor::BulletedList)] | //Value[not(ancestor::BulletedList)],则选择不在Text元素内的所有ValueBulletedList元素。

如果您要在ExplanationText元素内搜索,请使用//ExplanationText//Text[not(ancestor::BulletedList)] | //ExplanationText//Value[not(ancestor::BulletedList)]