鉴于下面的代码给了我这样的输出,
<a href="https://timesofindia.indiatimes.com/india/uk-envoy-lays-wreath-at-jallianwala-bagh-memorial-expresses-deep-regret/articleshow/68860078.cms"><img border="0" hspace="10" align="left" style="margin-top:3px;margin-right:5px;" src="https://timesofindia.indiatimes.com/photo/68860078.cms" /></a>British High Commissioner to India Sir Dominic Asquith laid a wreath at the Jallianwala Bagh memorial here on Saturday on the centenary of the massacre and said Britain "deeply regretted" the suffering caused to the victims.
我正在尝试提取</a>
此标记之后的文本
这是我的代码,jsoup中是否有任何方法可以做到这一点或 还有其他我想念的东西吗?
try {
Document document = Jsoup.connect("https://timesofindia.indiatimes.com/rssfeeds/-2128936835.cms").parser(Parser.xmlParser()).get();
Elements items = document.getElementsByTag("item");
for (Element element : items) {
String title = element.select("title").text();
String link = element.select("link").text();
String time = element.select("pubDate").text();
String description = element.select("description").text();
System.out.println(description);
}
} catch (IOException ex) {
Logger.getLogger(TimesOfIndia.class.getName()).log(Level.SEVERE, null, ex);
}
预期产量:英国驻印度高级专员多米尼克·阿斯奎斯爵士(Sir Dominic Asquith)周六在大屠杀百周年纪念日在贾里安瓦拉·巴格(Jallianwala Bagh)纪念馆敬献花圈,并说英国“深切遗憾”给受害者造成的痛苦。
输出:<a href="https://timesofindia.indiatimes.com/india/uk-envoy-lays-wreath-at-jallianwala-bagh-memorial-expresses-deep-regret/articleshow/68860078.cms"><img border="0" hspace="10" align="left" style="margin-top:3px;margin-right:5px;" src="https://timesofindia.indiatimes.com/photo/68860078.cms" /></a>British High Commissioner to India Sir Dominic Asquith laid a wreath at the Jallianwala Bagh memorial here on Saturday on the centenary of the massacre and said Britain "deeply regretted" the suffering caused to the victims.
答案 0 :(得分:1)
Element
具有nextSibling()
方法,该方法应该起作用:
element.select("description").select("a").nextSibling().text();
答案 1 :(得分:0)
我使用自己的解决方法解决了该问题,这是代码
解决方案 所以我是这样做的,所以这段代码是做什么的?我创建了一个新的文档对象并删除了标签,然后简单地打印出了文本,是的,这不是最好的方法,但是仍然可以使用
d = Jsoup.parse(desc);
Elements a = d.select("a");
a.remove();
System.out.println(d.body().text());
完整代码
try {
Document d;
Document document = Jsoup.connect("https://timesofindia.indiatimes.com/rssfeeds/-2128936835.cms").parser(Parser.xmlParser()).get();
Elements items = document.getElementsByTag("item");
for (Element element : items) {
String title = element.select("title").text();
String link = element.select("link").text();
String time = element.select("pubDate").text();
String desc = element.select("description").text();
d = Jsoup.parse(desc);
Elements a = d.select("a");
a.remove();
System.out.println(d.body().text());
}
} catch (IOException ex) {
Logger.getLogger(TimesOfIndia.class.getName()).log(Level.SEVERE, null, ex);
}