这段Java代码打印了NYT World RSS中每个项目的标题,链接和发布日期。但对于纽约时报的科学RSS,它不打印链接字段。这里发生了什么?
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse( direccion );
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/rss/channel/item");
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nl.getLength(); i++) {
Node node = nl.item(i);
Node nodoTitulo = (Node) xpath.evaluate("title", node, XPathConstants.NODE);
System.out.println(nodoTitulo.getTextContent());
Node nodoLink = (Node) xpath.evaluate("link", node, XPathConstants.NODE);
System.out.println(nodoLink.getTextContent());
Node nodoFecha = (Node) xpath.evaluate("pubDate", node, XPathConstants.NODE);
System.out.println(nodoFecha.getTextContent());
System.out.println();
}
答案 0 :(得分:0)
这是一个namespace
问题。
在科学RSS中,你有
<atom:link href="http://www.nytimes.com/2012/08/19/business/new-wave-of-adept-robots-is-changing-global-industry.html?partner=rss&emc=rss" rel="standout"/>
<title>The iEconomy: New Wave of Deft Robots Is Changing Global Industry</title>
<link>http://feeds.nytimes.com/click.phdo?i=5861b5e3f6b66da6ca12beab1e5d8729</link>
在世界RSS中,你有
<title>Syrian Rebels Claim to Have Brought Down a Jet</title>
<link>http://feeds.nytimes.com/click.phdo?i=314bd32f9d6141a500e76e3076c489c9</link>
.
.
.
<atom:link rel="standout" href="http://www.nytimes.com/2012/08/14/world/middleeast/syrian-rebels-claim-to-have-brought-down-a-jet.html?partner=rss&emc=rss"/>
您的代码首先获取<atmoic:link>
节点。
添加:
factory.setNamespaceAware(true);
创建工厂之后,在创建构建器之前,您现在应该获得链接
title = The iEconomy: New Wave of Deft Robots Is Changing Global Industry
link = http://feeds.nytimes.com/click.phdo?i=5861b5e3f6b66da6ca12beab1e5d8729
pubDate = Sun, 19 Aug 2012 21:26:33 GMT
如果您真的感兴趣,可以阅读this了解更多信息