我有这段HTML:
<div class="subscribe-page" itemprop="text"><p><strong>Far</strong> behind the word mountains, far from the countries Vokalia and Consonantia.<br>
<strong>there:</strong> live the blind texts. Separated they live in Bookmarksgrove<br>
<strong>A small:</strong> river named Duden flows by their place and supplies it with the necessary regelialia.</p>
<p>The Big Oxmox advised her not to do so, because there were thousands of bad Commas, wild Question Marks and devious Semikoli, but the Little Blind Text didn’t listen</p>
<div id="form" class="petersemail" lang="en-GB">
<form target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=balmpeters', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true" method="post" action="http://feedburner.google.com/fb/a/mailverify">
<input type="hidden" value="balmpeters" name="uri" />
<input type="hidden" value="en_US" name="loc" />
<span class="fa fa-envelope" style="
top: 29px;
left: 11px;
position: relative;
color: #000;
font-size: 20px;
"></span>
<center> <input class="emailText" type="text" value="Enter your email..." onfocus="if (this.value == "Enter your email...") {this.value = ""}" onblur="if (this.value == "") {this.value = "Enter your email...";}" name="email" /></center>
<div style="
width: 33%;
margin: 3px auto 10px;
"><input type="submit" value="" title="" class="emailButton" /></div>
<p class="stext">Please remember to check your email to confirm the free subscription</p>
</form>
</div>
</div>
我正在使用Jsoup来解析它:
private void parseHtml(String response) {
Log.d(TAG, "parsinghtml");
Document document = Jsoup.parse(response);
String page_content = document.select("div.subscribe-page").first().html();
Spanned spanned = Html.fromHtml(page_content);
pageContent.setText(spanned);
}
问题是它正在显示整个html页面,有什么方法可以忽略<div id="form" class="petersemail" lang="en-GB">
及其内容吗?换句话说,我只想要
<div class="subscribe-page" itemprop="text"><p><strong>Far</strong> behind the word mountains, far from the countries Vokalia and Consonantia.<br>
<strong>there:</strong> live the blind texts. Separated they live in Bookmarksgrove<br>
<strong>A small:</strong> river named Duden flows by their place and supplies it with the necessary regelialia.</p>
<p>The Big Oxmox advised her not to do so, because there were thousands of bad Commas, wild Question Marks and devious Semikoli, but the Little Blind Text didn’t listen</p>
答案 0 :(得分:3)
初始CSS查询(div.subscribe-page
)可以像这样增强:
div.subscribe-page > *:not(div#form)
div.subscribe-page /* Select div with class subscribe-page */
> * /* Select all its child elements... */
:not(div#form) /* ... excluding div with id form */
答案 1 :(得分:2)
“忽略”<div id="form" ...>
元素的一种方法是将其从文档中删除:
document
.select("div#form")
.remove();
之后你可以使用
String page_content = document.select("div.subscribe-page").first().html();
获取div
的内容(不含div
本身)。如果您想加入div
,只需使用.toString()
代替.html()