我正在使用jsoup来解析html文档。我需要在包含id属性的 SPAN 标记之后的 P标记的值。
我正在尝试使用以下代码
Elements spanList = body.select("span");
if (spanList != null) {
for (Element element1 : spanList) {
if (element1.attr("id").contains("midArticle")) {
Element element = element1.after("<p>"); // This line is wrong
if (element != null) {
String text = element.text();
if (text != null && !text.isEmpty()) {
out.println(text);
}
}
}
}
}
html示例代码
<span id="midArticle_9"></span>
<p>"The Director owes it to the American people to immediately provide the full details of what he is now examining," Podesta said in a statement. "We are confident this will not produce any conclusions different from the one the FBI reached in July." </p>
<span id="midArticle_10"></span>
<p>Clinton has repeatedly apologized for using the private email server in her home instead of a government email account for her work as secretary of state from 2009 to 2013. She has said she did not knowingly send or receive classified information.</p>
答案 0 :(得分:1)
我希望这能解决你的问题...
public static void main(String[] args) {
String html = "<span id=\"midArticle_9\"></span><p>\"The Director owes it to the American people to immediately provide the full details of what he is now examining,\" Podesta said in a statement. \"We are confident this will not produce any conclusions different from the one the FBI reached in July.\" </p><span id=\"midArticle_10\"></span><p>Clinton has repeatedly apologized for using the private email server in her home instead of a government email account for her work as secretary of state from 2009 to 2013. She has said she did not knowingly send or receive classified information.</p>";
Document document = Jsoup.parse(html);
Elements elements = document.getElementsByTag("span");
for (Element element : elements) {
System.out.println(element.nextElementSibling().text());
}
}