从XML文件中解析HTML内容

时间:2016-07-18 11:31:02

标签: java xml

    <xbrli:xbrl xmlns:aoi="http://www.aointl.com/20160331" xmlns:country="http://xbrl.sec.gov/country/2016-01-31" xmlns:currency="http://xbrl.sec.gov/currency/2016-01-31" xmlns:dei="http://xbrl.sec.gov/dei/2014-01-31" xmlns:exch="http://xbrl.sec.gov/exch/2016-01-31" xmlns:invest="http://xbrl.sec.gov/invest/2013-01-31" xmlns:iso4217="http://www.xbrl.org/2003/iso4217" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:naics="http://xbrl.sec.gov/naics/2011-01-31" xmlns:nonnum="http://www.xbrl.org/dtr/type/non-numeric" xmlns:num="http://www.xbrl.org/dtr/type/numeric" xmlns:ref="http://www.xbrl.org/2006/ref" xmlns:sic="http://xbrl.sec.gov/sic/2011-01-31" xmlns:stpr="http://xbrl.sec.gov/stpr/2011-01-31" xmlns:us-gaap="http://fasb.org/us-gaap/2016-01-31" xmlns:us-roles="http://fasb.org/us-roles/2016-01-31" xmlns:us-types="http://fasb.org/us-types/2016-01-31" xmlns:utreg="http://www.xbrl.org/2009/utr" xmlns:xbrldi="http://xbrl.org/2006/xbrldi" xmlns:xbrldt="http://xbrl.org/2005/xbrldt" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <link:schemaRef xlink:href="aoi-20160331.xsd" xlink:type="simple"/>
    <xbrli:context id="FD2016Q4YTD">
    <xbrli:entity>
    <xbrli:identifier scheme="http://www.sec.gov/CIK">0000939930</xbrli:identifier>
    </xbrli:entity>
    <xbrli:period>
    <xbrli:startDate>2015-04-01</xbrli:startDate>
    <xbrli:endDate>2016-03-31</xbrli:endDate>
    </xbrli:period>
    </xbrli:context>

    <aoi:OtherIncomeAndExpensePolicyTextBlock contextRef="FD2016Q4YTD" id="Fact-F51C7616E17E5B8B0B770D410BBF5A3E">
    <div style="font-family:Times New Roman;font-size:10pt;"><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;font-weight:bold;">Other Income (Expense)</font></div><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;"></font></div></div>
    </aoi:OtherIncomeAndExpensePolicyTextBlock>
    </xbrli:xbrl>

This is My XML[XBRL], i need to parse this. This xml is my input and i don't know whether its a valid or not but in need output like this :

    <div style="font-family:Times New Roman;font-size:10pt;"><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;font-weight:bold;">Other Income (Expense)</font></div><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;"></font></div></div>

Please someone share me the knowledge for this problem i am facing from last two weeks.

this is the code i am using 

    File fXmlFile = new File("/home/devteam-user1/Desktop/ky/UnitTesting.xml");
                DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
                DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
                Document doc = dBuilder.parse(fXmlFile);

                XPath xPath =  XPathFactory.newInstance().newXPath();
                final String DIV_UNDER_ROOT = "/*/aoi";
                NodeList divList = (NodeList)xPath.compile(DIV_UNDER_ROOT)
                        .evaluate(doc, XPathConstants.NODESET);
                System.out.println(divList.getLength());
                for (int i = 0; i < divList.getLength() ; i++) {  // just in case there is more than one
                    Node divNode = divList.item(i);
                    System.out.println(nodeToString(divNode));

//nodeToString method below 

    private static String nodeToString(Node node) throws Exception
        {
            Transformer transformer = TransformerFactory.newInstance().newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            StreamResult result = new StreamResult(new StringWriter());
            transformer.transform(new DOMSource(node), result);
            return result.getWriter().toString();
        }

2 个答案:

答案 0 :(得分:0)

这对我很有用

M

答案 1 :(得分:0)

你的主要问题在于

final String DIV_UNDER_ROOT = "/*/aoi";

哪个XPath表达式匹配&#34;根目录下的任何节点2级别,其本地名称为aoi且没有名称空间&#34;。这不是你想要的。

您希望匹配深度为两级的节点的任何内容,其命名空间由&#34; aoi&#34; (这意味着它属于&#34; http://www.aointl.com/20160331&#34;命名空间),其本地名称为&#34; OtherIncomeAndExpensePolicyTextBlock&#34;。

Java中XPath中的匹配命名空间非常繁琐(请参阅XPath with namespace in JavaHow to query XML using namespaces in Java with XPath?),但长话短说,您可以尝试这种方式:

final String DIV_UNDER_ROOT = "//*[local-name()='OtherIncomeAndExpensePolicyTextBlock' and namespace-uri()='http://www.aointl.com/20160331']/*";

这只有在您的DocumentBuilderFactory可以识别名称空间时才有效,因此您应该通过如上所述进行配置来确保:

DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);