Question

我试图从这个Feed中获取文章链接列表：

http://rss.cbc.ca/lineup/topstories.xml

然而，当Jsoup读入时，标记<link>http://www.cbc.ca/news/?cmp=rss</link>中的链接变为<link />http://www.cbc.ca/news/?cmp=rss

即标签自我关闭，如果我做

Elements items = doc.select("link");

它没有抓住任何链接。

Answer 1

JSoup是一个HTML解析器，在HTML中，link元素被定义为具有空内容模型。您提供的网址似乎包含有效的xml，那么为什么不尝试使用实际的xml解析器或像rome这样的Feed解析器库？

编辑：要使用JDK的Xpath实现从文件中提取链接，您可以使用以下代码：

XPathFactory xpf = XPathFactory.newInstance();
XPath xp = xpf.newXPath();
InputSource is = new InputSource("http://rss.cbc.ca/lineup/topstories.xml");
NodeList nodes = (NodeList)xp.evaluate("//link", is, XPathConstants.NODESET);
for (int i=0, len=nodes.getLength(); i<len; i++) {
    Node node = nodes.item(i);
    String link = node.getTextContent();
    System.out.println(link);
}

解析RSS时的Jsoup错误？

1 个答案: