我怎样才能使用Jsoup选择器"而不是"

时间:2015-10-06 07:52:12

标签: java html html-parsing jsoup

<div id='contents'>
<div class="article_view">
  <div class="article_txt">
      <strong>I don't want to get this point
        <br>I don't want to get this point
        <br>I don't want to get this point
      </strong>

      <div class='articlePhotoC'>
        <img src="" width='500'>
        <span class='t' style='width:480px;'>
          <b>I don't want to get this point </b>
          I don't want to get this point<br>
        </span>
        <div id='adBox' class='txt_ad' style='width:500px;'></div>
      </div>
      From here I want to get--------------
      <br><br>
      <div class='sub_cont_AD08'></div> 
  </div>
</div>

我不知道如何在Java中使用notSelector。 我试着这样做:

  Elements cont = doc.select("div.article_view :not(div.article_view)"); 

但它不起作用。结果包括所有&#34;我不想得到这一点&#34;。我希望只得到#34;从这里我想得到~~~~&#34;。

谢谢!

1 个答案:

答案 0 :(得分:2)

If you don't need the text "From here I want to get--------------" as well, i.e. you only want to select Elements within the <div class="article_view"> but not <div class="article_txt"> and its children, you can do this:

Elements els = doc.select("div.article_view>*:not(.article_txt)");

This will select all Elements (*) that are direct children (>) of the div with class "article_view" except the ones with class "article_txt".

EDIT

Now, that it has been defined, that the elements you want are indeed children of the div.article_txt element, I need to modify my answer:

Elements els = doc.select("div.article_view>div.article_txt>*:not(strong,div.articlePhotoC)");

This becomes cumbersome, since you now need to define a list of stuff that should not be included. Note the comma between strong and div.articlePhotoC serving as AND operator in CSS