我正在尝试使用DOM解析器解析xml文件。尝试解析以下xml文件时,我有一种奇怪的解析行为:
<data-list>
<entry>
<meta-data>
<meta name="HANDLE">1</meta>
</meta-data>
<compound>
<name>Numeric</name>
<entries>
<entry>
<meta-data>
<meta name="partition">2</meta>
<meta name="metric-id">18948</meta>
<meta name="unit-code">3872</meta>
<meta name="unit">mmHg</meta>
</meta-data>
<compound>
<name>Compound-Basic-Nu-Observed-Value</name>
<entries>
<entry>
<meta-data>
<meta name="partition">2</meta>
<meta name="metric-id">18949</meta>
</meta-data>
<simple>
<name>0</name>
<type>float</type>
<value>120.000000</value>
</simple>
</entry>
<entry>
<meta-data>
<meta name="partition">2</meta>
<meta name="metric-id">18950</meta>
</meta-data>
<simple>
<name>1</name>
<type>float</type>
<value>76.000000</value>
</simple>
</entry>
<entry>
<meta-data>
<meta name="partition">2</meta>
<meta name="metric-id">18951</meta>
</meta-data>
<simple>
<name>2</name>
<type>float</type>
<value>91.000000</value>
</simple>
</entry>
</entries>
</compound>
</entry>
<entry>
<compound>
<name>Absolute-Time-Stamp</name>
<entries>
<entry>
<simple>
<name>century</name>
<type>intu8</type>
<value>20</value>
</simple>
</entry>
<entry>
<simple>
<name>year</name>
<type>intu8</type>
<value>14</value>
</simple>
</entry>
<entry>
<simple>
<name>month</name>
<type>intu8</type>
<value>2</value>
</simple>
</entry>
<entry>
<simple>
<name>day</name>
<type>intu8</type>
<value>6</value>
</simple>
</entry>
<entry>
<simple>
<name>hour</name>
<type>intu8</type>
<value>15</value>
</simple>
</entry>
<entry>
<simple>
<name>minute</name>
<type>intu8</type>
<value>26</value>
</simple>
</entry>
<entry>
<simple>
<name>second</name>
<type>intu8</type>
<value>14</value>
</simple>
</entry>
<entry>
<simple>
<name>sec_fractions</name>
<type>intu8</type>
<value>0</value>
</simple>
</entry>
</entries>
</compound>
</entry>
</entries>
</compound>
</entry>
</data-list>
我正在尝试使用getChildNotes()
方法导航到每个元素。但是,当我使用getChildNotes()作为data-list元素时,我只获得3个元素(虽然我期望只获得一个“entry”元素)。有人可以向我澄清一下吗?
我的解析代码:
Document d = parse_xml(xml);
NodeList datalists = d.getElementsByTagName("data-list");
// data list
for (int i = 0; i < datalists.getLength(); ++i) {
Node datalist = datalists.item(i);
NodeList entries = datalist.getChildNodes();
// prints out 3
System.out.println(entries.getLength());
}
parse_xml():
public static Document parse_xml(String xml)
{
Document d = null;
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
d = db.parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
d.getDocumentElement().normalize();
} catch (ParserConfigurationException e) {
System.out.println("XML parser error");
} catch (SAXException e) {
System.out.println("SAX exception");
} catch (IOException e) {
System.out.println("IO exception in xml parsing");
}
return d;
}
答案 0 :(得分:2)
这是因为其中一个是元素Node,另外两个是文本节点,这是因为你的文件结构。您可以通过格式化xml来获得一个作为答案,如下所示:
<data-list><entry></entry></data-list>
而不是:
<data-list>
<entry>
</entry>
</data-list>
输出只有一个。
您可以查看以下代码修正:
NodeList entries = datalist.getChildNodes();
for(int j=0;j<entries.getLength();j++)
{
System.out.println(entries.item(j).getNodeName() +
"<<<>>>>" + entries.item(j).getNodeType());
}
其中nodetypes可以在Node类的源代码中看到。