Question

鉴于下面的代码给了我这样的输出，

<a href="https://timesofindia.indiatimes.com/india/uk-envoy-lays-wreath-at-jallianwala-bagh-memorial-expresses-deep-regret/articleshow/68860078.cms"><img border="0" hspace="10" align="left" style="margin-top:3px;margin-right:5px;" src="https://timesofindia.indiatimes.com/photo/68860078.cms" /></a>British High Commissioner to India Sir Dominic Asquith laid a wreath at the Jallianwala Bagh memorial here on Saturday on the centenary of the massacre and said Britain "deeply regretted" the suffering caused to the victims.

我正在尝试提取</a>此标记之后的文本

这是我的代码，jsoup中是否有任何方法可以做到这一点或还有其他我想念的东西吗？

try {
            Document document = Jsoup.connect("https://timesofindia.indiatimes.com/rssfeeds/-2128936835.cms").parser(Parser.xmlParser()).get();
            Elements items = document.getElementsByTag("item");
            for (Element element : items) {
                String title = element.select("title").text();
                String link = element.select("link").text();
                String time = element.select("pubDate").text();
                String description = element.select("description").text();
            System.out.println(description);
            }
        } catch (IOException ex) {
            Logger.getLogger(TimesOfIndia.class.getName()).log(Level.SEVERE, null, ex);
        }

预期产量：英国驻印度高级专员多米尼克·阿斯奎斯爵士（Sir Dominic Asquith）周六在大屠杀百周年纪念日在贾里安瓦拉·巴格（Jallianwala Bagh）纪念馆敬献花圈，并说英国“深切遗憾”给受害者造成的痛苦。

输出：<a href="https://timesofindia.indiatimes.com/india/uk-envoy-lays-wreath-at-jallianwala-bagh-memorial-expresses-deep-regret/articleshow/68860078.cms"><img border="0" hspace="10" align="left" style="margin-top:3px;margin-right:5px;" src="https://timesofindia.indiatimes.com/photo/68860078.cms" /></a>British High Commissioner to India Sir Dominic Asquith laid a wreath at the Jallianwala Bagh memorial here on Saturday on the centenary of the massacre and said Britain "deeply regretted" the suffering caused to the victims.

Answer 1

Element具有nextSibling()方法，该方法应该起作用：

element.select("description").select("a").nextSibling().text();

Answer 2

我使用自己的解决方法解决了该问题，这是代码

解决方案 所以我是这样做的，所以这段代码是做什么的？我创建了一个新的文档对象并删除了标签，然后简单地打印出了文本，是的，这不是最好的方法，但是仍然可以使用

d = Jsoup.parse(desc);
        Elements a = d.select("a");
        a.remove();
        System.out.println(d.body().text());

完整代码

try {
        Document d;
        Document document = Jsoup.connect("https://timesofindia.indiatimes.com/rssfeeds/-2128936835.cms").parser(Parser.xmlParser()).get();
        Elements items = document.getElementsByTag("item");
        for (Element element : items) {
            String title = element.select("title").text();
            String link = element.select("link").text();
            String time = element.select("pubDate").text();
            String desc = element.select("description").text();
            d = Jsoup.parse(desc);
            Elements a = d.select("a");
            a.remove();
            System.out.println(d.body().text());

        }
    } catch (IOException ex) {
        Logger.getLogger(TimesOfIndia.class.getName()).log(Level.SEVERE, null, ex);
    }

-已解决-使用Jsoup在标签后提取文本

2 个答案: