Java:解析RSS提要时出错

时间:2018-04-18 16:51:31

标签: java xml rss

下面你可以看到代码。

public static void main(String[] args) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setValidating(true);
        factory.setIgnoringElementContentWhitespace(true);
        DocumentBuilder builder = factory.newDocumentBuilder();

        Document doc = builder.parse("http://rss.adnkronos.com/RSS_Politica.xml");

        NodeList nodes = doc.getElementsByTagName("title");

        for(int k=0; k < nodes.getLength(); k++) {
            System.out.print(nodes.item(k));
        }

    }

RSS Feed的链接如下:http://rss.adnkronos.com/RSS_Politica.xml

结果(在控制台中)如下:

  

null null null null null null null null null null null null   null null null null null null

如你在xml中看到的那样,节点标题的值显然不是空的。

结果之后,显示以下错误(翻译自意大利语)。

  

错误:URI = http://rss.adnkronos.com/RSS_Politica.xml行= 1:根   元素&#34; rss&#34;必须匹配根DOCTYPE&#34; null&#34;。

     

错误:URI = http://rss.adnkronos.com/RSS_Politica.xml行= 1:文档   无效:未找到语法。

2 个答案:

答案 0 :(得分:1)

查看您获得的错误的验证选项。 就标题的null来说,似乎Node上的toString只返回null或者做一些刚刚变为null的东西。如果您将其更新为System.out.print(nodes.item(k).getTextContent());,它将打印出标题。

答案 1 :(得分:1)

There are two problems. Let's take care of the one you probably care most about first.

The nodes in your NodeList are Element nodes. The actual Text nodes are their children. So to get the values you want, you can do:

nodes.item(k).getFirstChild().getNodeValue()

Or (in this case):

nodes.item(k).getTextContent()

Personally I think the former is slightly more robust when doing general parsing because getTextContent() will concatenate all the text content from all the child nodes if there just happened to be more than one.

As for the validation errors, by default when you do setValidating(true), it's looking for an embedded DTD, which is not there, and it's complaining to you about it. The tl;dr is to setValidating(false).

If you really want to validate the RSS, you should try to find an unofficial (because there is no official one) XSD schema file and set that up in your DocumentBuilderFactory. Using an XSD for RSS in this context is probably not worthwhile, though, because half the RSS on the Internet, while perfectly usable, would probably fail validation :).