Question

我正在尝试在此xml中获取“link”标记元素的文本：http://www.istana.gov.sg/latestupdate/rss.xml

我编写了第一篇文章。

        URL = getResources().getString(R.string.istana_home_page_rss_xml);
        // URL = "http://www.istana.gov.sg/latestupdate/rss.xml";

        try {
            doc = Jsoup.connect(URL).ignoreContentType(true).get();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        // retrieve the link of the article
        links = doc.select("link");

        // retrieve the publish date of the article
        dates = doc.select("pubDate");

        //retrieve the title of the article
        titles = doc.select("title");

        String[] article1 = new String[3];
        article1[0] = links.get(1).text();
        article1[1] = titles.get(1).text();
        article1[2] = dates.get(0).text();

文章出来很好，但链接返回“”值（整个链接元素返回“”值）。标题和日期没有问题。链接标记由URL文本组成。任何人都知道为什么它会返回“”值？

Answer 1

默认的HTML解析器看起来无法将<link>识别为有效标记，并自动将其关闭<link />，这意味着此标记的内容为空。

要解决此问题而不是HTML解析器，您可以使用XML解析器，它不关心标记名称。

doc = Jsoup.connect(URL)
      .ignoreContentType(true)
      .parser(Parser.xmlParser()) // <-- add this
      .get();

Jsoup .select返回空值，但元素确实包含文本

1 个答案: