奇怪的xml解析行为

时间:2014-02-10 09:44:48

标签: java xml dom xml-parsing

我正在尝试使用DOM解析器解析xml文件。尝试解析以下xml文件时,我有一种奇怪的解析行为:

<data-list>
    <entry>
        <meta-data>
            <meta name="HANDLE">1</meta>
        </meta-data>
        <compound>
            <name>Numeric</name>
            <entries>
                <entry>
                    <meta-data>
                        <meta name="partition">2</meta>
                        <meta name="metric-id">18948</meta>
                        <meta name="unit-code">3872</meta>
                        <meta name="unit">mmHg</meta>
                    </meta-data>
                    <compound>
                        <name>Compound-Basic-Nu-Observed-Value</name>
                        <entries>
                            <entry>
                                <meta-data>
                                    <meta name="partition">2</meta>
                                    <meta name="metric-id">18949</meta>
                                </meta-data>
                                <simple>
                                    <name>0</name>
                                    <type>float</type>
                                    <value>120.000000</value>
                                </simple>
                            </entry>
                            <entry>
                                <meta-data>
                                    <meta name="partition">2</meta>
                                    <meta name="metric-id">18950</meta>
                                </meta-data>
                                <simple>
                                    <name>1</name>
                                    <type>float</type>
                                    <value>76.000000</value>
                                </simple>
                            </entry>
                            <entry>
                                <meta-data>
                                    <meta name="partition">2</meta>
                                    <meta name="metric-id">18951</meta>
                                </meta-data>
                                <simple>
                                    <name>2</name>
                                    <type>float</type>
                                    <value>91.000000</value>
                                </simple>
                            </entry>
                        </entries>
                    </compound>
                </entry>
                <entry>
                    <compound>
                        <name>Absolute-Time-Stamp</name>
                        <entries>
                            <entry>
                                <simple>
                                    <name>century</name>
                                    <type>intu8</type>
                                    <value>20</value>
                                </simple>
                            </entry>
                            <entry>
                                <simple>
                                    <name>year</name>
                                    <type>intu8</type>
                                    <value>14</value>
                                </simple>
                            </entry>
                            <entry>
                                <simple>
                                    <name>month</name>
                                    <type>intu8</type>
                                    <value>2</value>
                                </simple>
                            </entry>
                            <entry>
                                <simple>
                                    <name>day</name>
                                    <type>intu8</type>
                                    <value>6</value>
                                </simple>
                            </entry>
                            <entry>
                                <simple>
                                    <name>hour</name>
                                    <type>intu8</type>
                                    <value>15</value>
                                </simple>
                            </entry>
                            <entry>
                                <simple>
                                    <name>minute</name>
                                    <type>intu8</type>
                                    <value>26</value>
                                </simple>
                            </entry>
                            <entry>
                                <simple>
                                    <name>second</name>
                                    <type>intu8</type>
                                    <value>14</value>
                                </simple>
                            </entry>
                            <entry>
                                <simple>
                                    <name>sec_fractions</name>
                                    <type>intu8</type>
                                    <value>0</value>
                                </simple>
                            </entry>
                        </entries>
                    </compound>
                </entry>
            </entries>
        </compound>
    </entry>
</data-list>

我正在尝试使用getChildNotes()方法导航到每个元素。但是,当我使用getChildNotes()作为data-list元素时,我只获得3个元素(虽然我期望只获得一个“entry”元素)。有人可以向我澄清一下吗?

我的解析代码:

Document d = parse_xml(xml);

NodeList datalists = d.getElementsByTagName("data-list");

// data list
for (int i = 0; i < datalists.getLength(); ++i) {

    Node datalist = datalists.item(i);

    NodeList entries =  datalist.getChildNodes();
    // prints out 3
        System.out.println(entries.getLength());
}

parse_xml():

public static Document parse_xml(String xml)
    {
        Document d = null;

        try {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            d = db.parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
            d.getDocumentElement().normalize();
        } catch (ParserConfigurationException e) {
            System.out.println("XML parser error");
        } catch (SAXException e) {
            System.out.println("SAX exception");
        } catch (IOException e) {
            System.out.println("IO exception in xml parsing");
        }

        return d;
    }

1 个答案:

答案 0 :(得分:2)

这是因为其中一个是元素Node,另外两个是文本节点,这是因为你的文件结构。您可以通过格式化xml来获得一个作为答案,如下所示:

<data-list><entry></entry></data-list>

而不是:

<data-list>
<entry>
</entry>
</data-list>

输出只有一个。

您可以查看以下代码修正:

NodeList entries =  datalist.getChildNodes();
for(int j=0;j<entries.getLength();j++)
{
System.out.println(entries.item(j).getNodeName() + 
"<<<>>>>" + entries.item(j).getNodeType());
}

其中nodetypes可以在Node类的源代码中看到。