我是Java新手,我正在尝试编写一个程序来获取MW api中给定单词的含义。输出是XML,现在我使用DOM解析器打印所有定义的列表。通常,检索到的XML将如下所示
<?xml version="1.0" encoding="utf-8" ?>
<entry_list version="1.0">
<entry id="dictionary"><ew>dictionary</ew><subj>PU-1#PU-2#PU-3#CP-4</subj><hw>dic*tio*nary</hw><sound><wav>dictio04.wav</wav></sound><pr>ˈdik-shə-ˌner-ē, -ˌne-rē</pr><fl>noun</fl><in><il>plural</il> <if>dic*tio*nar*ies</if></in><et>Medieval Latin <it>dictionarium,</it> from Late Latin <it>diction-, dictio</it> word, from Latin, speaking</et><def><date>1526</date> <sn>1</sn> <dt>:a reference source in print or electronic form containing words usually alphabetically arranged along with information about their forms, <d_link>pronunciations</d_link>, functions, <d_link>etymologies</d_link>, meanings, and <d_link>syntactical</d_link> and idiomatic uses</dt> <sn>2</sn> <dt>:a reference book listing alphabetically terms or names important to a particular subject or activity along with discussion of their meanings and <d_link>applications</d_link></dt> <sn>3</sn> <dt>:a reference book listing alphabetically the words of one language and showing their meanings or translations in another language</dt> <sn>4</sn> <dt>:a <d_link>computerized</d_link> list (as of items of data or words) used for reference (as for information retrieval or word processing)</dt></def></entry>
</entry_list>
定义列表将包含在标记<dt>
现在我遇到的问题是在标记<dt>
内,还有另一个子标记<d_link>
。每当DOM解析器在此子标记上运行时,getNodeValue()
方法正在考虑标记<dt>
的结尾
我的代码如下:
import org.w3c.dom.*;
import javax.xml.parsers.*;
public class Dictionary5 {
public static void main(String[] args) throws Exception {
String head = new String("http://www.dictionaryapi.com/api/v1/references/collegiate/xml/");
String word = new String("banal");
String apiKey = new String("?key=xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx"); //My API Key for Merriam webster
String finalURL = head.trim() + word.trim()+ apiKey.trim();
try
{
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
DocumentBuilder b = f.newDocumentBuilder();
Document doc = b.parse(finalURL);
doc.getDocumentElement().normalize();
NodeList items = doc.getElementsByTagName("entry");
for (int i = 0; i < items.getLength(); i++)
{
Node n = items.item(i);
if (n.getNodeType() != Node.ELEMENT_NODE)
continue;
Element e = (Element) n;
NodeList titleList = e.getElementsByTagName("dt");
for (int j = 0; j < titleList.getLength(); j++){
Node dt = titleList.item(j);
if (dt.getNodeType() != Node.ELEMENT_NODE)
continue;
Element titleElem = (Element) titleList.item(j);
Node titleNode = titleElem.getChildNodes().item(0);
System.out.println(titleNode.getNodeValue());
}
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
输出如下
:a reference source in print or electronic form containing words usually alphabetically arranged along with information about their forms,
:a reference book listing alphabetically terms or names important to a particular subject or activity along with discussion of their meanings and
:a reference book listing alphabetically the words of one language and showing their meanings or translations in another language
:a
如您所见,第一,第二和第四个定义突然结束,因为解析器遇到子标记<d_link>
。
我的预期输出如下:
:a reference source in print or electronic form containing words usually alphabetically arranged along with information about their forms, pronunciations, functions, etymologies, meanings, and syntactical and idiomatic uses
:a reference book listing alphabetically terms or names important to a particular subject or activity along with discussion of their meanings and applications
:a reference book listing alphabetically the words of one language and showing their meanings or translations in another language
:a computerized list (as of items of data or words) used for reference (as for information retrieval or word processing)
有人可以帮我解决这个问题。任何帮助都非常感谢。提前谢谢。
答案 0 :(得分:0)
在dom模型中,dt标签的内容将是TEXT,d_link元素,TEXT,d_link ....
所以你想连接所有的文本元素(似乎也是d_link标签的内容)。你只是在读第一个:titleElem.getChildNodes()。item(0)所以它“突然”完成了