如何在jsoup中提取下一个标记元素

时间:2016-11-01 06:00:13

标签: java html jsoup

我正在使用jsoup来解析html文档。我需要在包含id属性的 SPAN 标记之后的 P标记的值。

我正在尝试使用以下代码

 Elements spanList = body.select("span");
    if (spanList != null) {
        for (Element element1 : spanList) {
            if (element1.attr("id").contains("midArticle")) {
                Element element = element1.after("<p>");  // This line is wrong 
                if (element != null) {
                    String text = element.text();
                    if (text != null && !text.isEmpty()) {
                        out.println(text);
                    }
                }
            }
        }
    }

html示例代码

<span id="midArticle_9"></span>
<p>"The Director owes it to the American people to immediately provide the full details of what he is now examining," Podesta said in a statement. "We are confident this will not produce any conclusions different from the one the FBI reached in July." </p>
<span id="midArticle_10"></span>
<p>Clinton has repeatedly apologized for using the private email server in her home instead of a government email account for her work as secretary of state from 2009 to 2013. She has said she did not knowingly send or receive classified information.</p>

1 个答案:

答案 0 :(得分:1)

我希望这能解决你的问题...

public static void main(String[] args) {
        String html = "<span id=\"midArticle_9\"></span><p>\"The Director owes it to the American people to immediately provide the full details of what he is now examining,\" Podesta said in a statement. \"We are confident this will not produce any conclusions different from the one the FBI reached in July.\" </p><span id=\"midArticle_10\"></span><p>Clinton has repeatedly apologized for using the private email server in her home instead of a government email account for her work as secretary of state from 2009 to 2013. She has said she did not knowingly send or receive classified information.</p>";
        Document document = Jsoup.parse(html);
        Elements elements = document.getElementsByTag("span");
        for (Element element : elements) {
            System.out.println(element.nextElementSibling().text());
        }
}