如何在java中读取子标记的内容以及XML的父标记

时间:2015-02-24 16:39:34

标签: java xml parsing

我是Java新手,我正在尝试编写一个程序来获取MW api中给定单词的含义。输出是XML,现在我使用DOM解析器打印所有定义的列表。通常,检索到的XML将如下所示

<?xml version="1.0" encoding="utf-8" ?>
<entry_list version="1.0">
    <entry id="dictionary"><ew>dictionary</ew><subj>PU-1#PU-2#PU-3#CP-4</subj><hw>dic*tio*nary</hw><sound><wav>dictio04.wav</wav></sound><pr>ˈdik-shə-ˌner-ē, -ˌne-rē</pr><fl>noun</fl><in><il>plural</il> <if>dic*tio*nar*ies</if></in><et>Medieval Latin <it>dictionarium,</it> from Late Latin <it>diction-, dictio</it> word, from Latin, speaking</et><def><date>1526</date> <sn>1</sn> <dt>:a reference source in print or electronic form containing words usually alphabetically arranged along with information about their forms, <d_link>pronunciations</d_link>, functions, <d_link>etymologies</d_link>, meanings, and <d_link>syntactical</d_link> and idiomatic uses</dt> <sn>2</sn> <dt>:a reference book listing alphabetically terms or names important to a particular subject or activity along with discussion of their meanings and <d_link>applications</d_link></dt> <sn>3</sn> <dt>:a reference book listing alphabetically the words of one language and showing their meanings or translations in another language</dt> <sn>4</sn> <dt>:a <d_link>computerized</d_link> list (as of items of data or words) used for reference (as for information retrieval or word processing)</dt></def></entry>
</entry_list>

定义列表将包含在标记<dt>

现在我遇到的问题是在标记<dt>内,还有另一个子标记<d_link>。每当DOM解析器在此子标记上运行时,getNodeValue()方法正在考虑标记<dt>的结尾

我的代码如下:

import org.w3c.dom.*;
import javax.xml.parsers.*;

public class Dictionary5 {
    public static void main(String[] args) throws Exception {
        String head = new String("http://www.dictionaryapi.com/api/v1/references/collegiate/xml/");
        String word = new String("banal");
        String apiKey = new String("?key=xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx"); //My API Key for Merriam webster
        String finalURL = head.trim() + word.trim()+ apiKey.trim();
        try
        {
            DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
            DocumentBuilder b = f.newDocumentBuilder();
            Document doc = b.parse(finalURL);

            doc.getDocumentElement().normalize();

            NodeList items = doc.getElementsByTagName("entry");
            for (int i = 0; i < items.getLength(); i++)
            {
                Node n = items.item(i);

                if (n.getNodeType() != Node.ELEMENT_NODE)
                    continue;

                Element e = (Element) n;
                NodeList titleList = e.getElementsByTagName("dt");
                for (int j = 0; j < titleList.getLength(); j++){
                    Node dt = titleList.item(j);
                    if (dt.getNodeType() != Node.ELEMENT_NODE)
                        continue;                   
                    Element titleElem = (Element) titleList.item(j);
                    Node titleNode = titleElem.getChildNodes().item(0);
                    System.out.println(titleNode.getNodeValue());
                }
            }
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }

    }
}

输出如下

:a reference source in print or electronic form containing words usually alphabetically arranged along with information about their forms, 
:a reference book listing alphabetically terms or names important to a particular subject or activity along with discussion of their meanings and 
:a reference book listing alphabetically the words of one language and showing their meanings or translations in another language
:a 

如您所见,第一,第二和第四个定义突然结束,因为解析器遇到子标记<d_link>

我的预期输出如下:

:a reference source in print or electronic form containing words usually alphabetically arranged along with information about their forms, pronunciations, functions, etymologies, meanings, and syntactical and idiomatic uses
:a reference book listing alphabetically terms or names important to a particular subject or activity along with discussion of their meanings and applications
:a reference book listing alphabetically the words of one language and showing their meanings or translations in another language
:a computerized list (as of items of data or words) used for reference (as for information retrieval or word processing)

有人可以帮我解决这个问题。任何帮助都非常感谢。提前谢谢。

1 个答案:

答案 0 :(得分:0)

在dom模型中,dt标签的内容将是TEXT,d_link元素,TEXT,d_link ....

所以你想连接所有的文本元素(似乎也是d_link标签的内容)。你只是在读第一个:titleElem.getChildNodes()。item(0)所以它“突然”完成了