我正在尝试获取xml中项目的信息,如下所示:
<item>
<title>The Colbert Report - Confused by Rick Parry With an "A" for America</title>
<guid isPermaLink="false">http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0</guid>
<link>http://rss.hulu.com/~r/HuluPopularVideosThisWeek/~3/6aeJ5cWMBzw/the-colbert-report-confused-by-rick-parry-with-an-a-for-america</link>
<description><a href="http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0"><img src="http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg" align="right" hspace="10" vspace="10" width="145" height="80" border="0" /></a><p>The fat cat media elites in Des Moines think they can sit in their ivory corn silos and play puppet master with national politics.</p><p><a href="http://www.hulu.com/users/add_to_playlist?from=feed&video_id=267788">Add this to your queue</a><br/>Added: Fri Aug 12 09:59:14 UTC 2011<br/>Air date: Thu Aug 11 00:00:00 UTC 2011<br/>Duration: 05:39<br/>Rating: 4.7 / 5.0<br/></p><img src="http://feeds.feedburner.com/~r/HuluPopularVideosThisWeek/~4/6aeJ5cWMBzw" height="1" width="1"/></description>
<pubDate>Fri, 12 Aug 2011 09:59:14 -0000</pubDate>
<media:thumbnail height="80" width="145" url="http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg" />
<media:credit>Comedy Central</media:credit>
<dcterms:valid>start=2011-08-12T00:15:00Z; end=2011-09-09T23:45:00Z; scheme=W3C-DTF</dcterms:valid>
<feedburner:origLink>http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0</feedburner:origLink></item>
<item>
我需要标题,链接,媒体:缩略图网址和说明。
我使用了http://www.rgagnon.com/javadetails/java-0573.html
中找到的方法标题和链接的工作正常,但图片网址和说明没有。
有人可以帮我这个吗?
答案 0 :(得分:3)
您可以使用XPath从XML文档中检索特定数据。
例如,为了检索url属性的内容:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String url = xpath.evaluate("/item/media:thumbnail/@url", new InputSource("data.xml"));
答案 1 :(得分:2)
try {
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new FileReader(new File("item.xml")));
Document doc = db.parse(is);
NodeList nodes = doc.getElementsByTagName("item");
// iterate the employees
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName("title");
Element line = (Element) title.item(0);
System.out.println("title: " + line.getTextContent());
NodeList link = element.getElementsByTagName("link");
line = (Element) link.item(0);
System.out.println("link: " + line.getTextContent());
NodeList mt = element.getElementsByTagName("media:thumbnail");
line = (Element) mt.item(0);
System.out.println("media:thumbnail: " + line.getTextContent());
Attr url = line.getAttributeNode("url");
System.out.println("media:thumbnail -> url: " + url.getTextContent());
}
}
catch (Exception e) {
e.printStackTrace();
}
对于url,首先获取元素媒体:缩略图,然后由于url是media:thumbnail的属性,您只需从media:thumbnail元素调用函数getAttributeNode(“url”)。
答案 2 :(得分:2)
对于纯DOM解决方案,您可以使用以下代码来获取所需的值:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("document.xml");
Element item = doc.getDocumentElement(); // assuming that item is a root element
NodeList itemChilds = item.getChildNodes();
for (int i = 0; i != itemChilds.getLength(); ++i)
{
Node itemChildNode = itemChilds.item(i);
if (!(itemChildNode instanceof Element))
continue;
Element itemChild = (Element) itemChildNode;
String itemChildName = itemChild.getNodeName();
if (itemChildName.equals("title")) // possible switch in Java 7
System.out.println("title: " + itemChild.getTextContent());
else if (itemChildName.equals("link"))
System.out.println("link: " + itemChild.getTextContent());
else if (itemChildName.equals("description"))
System.out.println("description: " + itemChild.getTextContent());
else if (itemChildName.equals("media:thumbnail"))
System.out.println("image url: " + itemChild.getAttribute("url"));
}
结果:
title: The Colbert Report - Confused by Rick Parry With an "A" for America
link: http://rss.hulu.com/~r/HuluPopularVideosThisWeek/~3/6aeJ5cWMBzw/the-colbert..
description: <a href="http://www.hulu.com/watch/267788/the-colbert-report-confuse..
image url: http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg
答案 3 :(得分:0)
这里的问题是description标签包含一个转义的xml(或许是html)字符串,而不仅仅是xml。
可能最简单的方法是获取此标记包含的文本,并打开另一个XML解析器以将其解析为单独的XML文档。如果它实际上是一个html片段而不是有效的xml,这可能不起作用。