如何忽略Jsoup中的子div标签

时间:2016-05-05 18:58:48

标签: android jsoup

我有这段HT​​ML:

<div class="subscribe-page" itemprop="text"><p><strong>Far</strong> behind the word mountains, far from the countries Vokalia and Consonantia.<br>
<strong>there:</strong> live the blind texts. Separated they live in Bookmarksgrove<br>
<strong>A small:</strong> river named Duden flows by their place and supplies it with the necessary regelialia.</p>
<p>The Big Oxmox advised her not to do so, because there were thousands of bad Commas, wild Question Marks and devious Semikoli, but the Little Blind Text didn’t listen</p>

<div id="form" class="petersemail" lang="en-GB">

<form target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=balmpeters', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true" method="post" action="http://feedburner.google.com/fb/a/mailverify">
    <input type="hidden" value="balmpeters" name="uri" />
    <input type="hidden" value="en_US" name="loc" />
<span class="fa fa-envelope" style="
    top: 29px;
    left: 11px;
    position: relative;
    color: #000;
    font-size: 20px;
"></span>
 <center>   <input class="emailText" type="text" value="Enter your email..." onfocus="if (this.value == &quot;Enter your email...&quot;) {this.value = &quot;&quot;}" onblur="if (this.value == &quot;&quot;) {this.value = &quot;Enter your email...&quot;;}" name="email" /></center>

<div style="
    width: 33%;
    margin: 3px auto 10px;
"><input type="submit" value="" title="" class="emailButton" /></div>
<p class="stext">Please remember to check your email to confirm the free subscription</p>

    </form>
</div>
</div>

我正在使用Jsoup来解析它:

private void parseHtml(String response) {
        Log.d(TAG, "parsinghtml");
        Document document = Jsoup.parse(response);
        String page_content = document.select("div.subscribe-page").first().html();

        Spanned spanned = Html.fromHtml(page_content);
        pageContent.setText(spanned);
    }

问题是它正在显示整个html页面,有什么方法可以忽略<div id="form" class="petersemail" lang="en-GB">及其内容吗?换句话说,我只想要

<div class="subscribe-page" itemprop="text"><p><strong>Far</strong> behind the word mountains, far from the countries Vokalia and Consonantia.<br>
 <strong>there:</strong> live the blind texts. Separated they live in Bookmarksgrove<br>
 <strong>A small:</strong> river named Duden flows by their place and supplies it with the necessary regelialia.</p>
 <p>The Big Oxmox advised her not to do so, because there were thousands of bad Commas, wild Question Marks and devious Semikoli, but the Little Blind Text didn’t listen</p>

2 个答案:

答案 0 :(得分:3)

初始CSS查询(div.subscribe-page)可以像这样增强:

div.subscribe-page > *:not(div#form)

DEMO

描述

div.subscribe-page /* Select div with class subscribe-page */
> *                /* Select all its child elements... */
:not(div#form)     /* ... excluding div with id form */

答案 1 :(得分:2)

“忽略”<div id="form" ...>元素的一种方法是将其从文档中删除:

document
    .select("div#form")
    .remove();

之后你可以使用

String page_content = document.select("div.subscribe-page").first().html();

获取div的内容(不含div本身)。如果您想加入div,只需使用.toString()代替.html()