使用jsoup访问没有classname或id的<div>

时间:2018-03-16 15:39:07

标签: android jsoup

我正在尝试使用jsoup从网站解析一些文本,但不幸的是<div>没有类名。我只是在学习jsoup而且我不知道jsoup的哪个函数会帮助我解析来自<div>的文本。

示例:

<div> .... ... ..... </div>

现在我只能使用classname

<div>获取文本

代码:

   document= Jsoup.connect(url).get();


                                    Elements element = document.select("div[class=pandora]");

                                    openBox = element.text();

来自jsoup.org的HTML:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Lyrics to &quot;Nuh Ready Nuh Ready&quot; song by Calvin Harris: Mi and di mandem We haffi run from half of di gyal dem So sweet, so sweet Don't want mi children and..."> 
<meta name="keywords" content="Nuh Ready Nuh Ready lyrics, Calvin Harris Nuh Ready Nuh Ready lyrics, Calvin Harris lyrics">
<meta name="robots" content="noarchive">
<meta property="og:image" content="//www.azlyrics.com/az_logo_tr.png">
<title>Calvin Harris Lyrics - Nuh Ready Nuh Ready</title>

<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css">
<link rel="stylesheet" href="//www.azlyrics.com/bsaz.css">

<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->

<script type="text/javascript">
ArtistName = "Calvin Harris";
SongName = "Nuh Ready Nuh Ready";
function submitCorrections(){
	document.getElementById('corlyr').submit();
	return false;
}
</script>
</head>
<body>

<!-- Begin comScore Tag -->
<script>
  var _comscore = _comscore || [];
  _comscore.push({ c1: "2", c2: "6772046" });
  (function() {
    var s = document.createElement("script"), el = document.getElementsByTagName("script")[0]; s.async = true;
    s.src = (document.location.protocol == "https:" ? "https://sb" : "http://b") + ".scorecardresearch.com/beacon.js";
    el.parentNode.insertBefore(s, el);
  })();
</script>
<noscript>
  <img src="https://sb.scorecardresearch.com/p?c1=2&c2=6772046&cv=2.0&cj=1" alt="">
</noscript>
<!-- End comScore Tag -->

<div id="fb-root"></div>
<script>(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/sdk.js#xfbml=1&version=v2.3";
  fjs.parentNode.insertBefore(js, fjs);
}(document, 'script', 'facebook-jssdk'));</script>

  <nav class="navbar navbar-default navbar-static-top noprint">
  <div class="container">
    <!-- Brand and toggle get grouped for better mobile display -->
    <div class="navbar-header">
      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#search-collapse">
        <span class="glyphicon glyphicon-search"></span>
      </button>
      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#artists-collapse">
        <span class="glyphicon glyphicon-th-list"></span>
      </button>
      <a class="navbar-brand" href="//www.azlyrics.com"><img alt="AZLyrics.com" class="pull-left" style="max-height:40px; margin-top:-10px;" src="//www.azlyrics.com/az_logo_tr.png"></a>
    </div>
    <ul class="collapse navbar-collapse nav navbar-nav" id="artists-collapse">
    <li>
    <div class="btn-group text-center" role="group">
    <a class="btn btn-menu" href="//www.azlyrics.com/a.html">A</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/b.html">B</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/c.html">C</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/d.html">D</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/e.html">E</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/f.html">F</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/g.html">G</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/h.html">H</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/i.html">I</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/j.html">J</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/k.html">K</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/l.html">L</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/m.html">M</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/n.html">N</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/o.html">O</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/p.html">P</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/q.html">Q</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/r.html">R</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/s.html">S</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/t.html">T</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/u.html">U</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/v.html">V</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/w.html">W</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/x.html">X</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/y.html">Y</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/z.html">Z</a>
    <a class="btn btn-menu" href="//www.azlyrics.com/19.html">#</a>
    </div>
    </li>
    </ul>

    <div class="collapse navbar-collapse" id="search-collapse">

        <form class="navbar-form navbar-right search" method="get" action="//search.azlyrics.com/search.php" role="search">
         <div class="input-group">  
		<input type="text" class="form-control" placeholder="" name="q" id="q">
       		<span class="input-group-btn">
            	  <button class="btn btn-primary" type="submit"><span class="glyphicon glyphicon-search"></span> Search</button>
          	</span>
 	  </div>   
	</form>

    </div><!-- /.navbar-collapse -->
    </div><!-- /.container -->
  </nav>

<!-- top ban -->
  <div class="lboard-wrap noprint">
  <div class="container">
    <div class="row">
       <div class="col-xs-12 top-ad text-center">
         <span id="cf_banner_top_nofc"></span>
       </div>
    </div>
  </div>
  </div>

<!-- main -->
<div class="container main-page">
<div class="row">
<div class="col-lg-2 text-center hidden-md hidden-sm hidden-xs noprint">
   <div class="sky-ad"></div>
</div>

<!-- content -->
<div class="col-xs-12 col-lg-8 text-center">

<div class="div-share noprint">
<div class="fb-like" style="float:left;" data-href="https://www.azlyrics.com/lyrics/calvinharris/nuhreadynuhready.html" data-layout="button_count" data-action="like" data-show-faces="false" data-share="false"></div>
<!-- AddThis Button BEGIN -->
<script type="text/javascript" src="https://s7.addthis.com/js/300/addthis_widget.js#username=azlyrics"></script>
<div class="addthis_toolbox addthis_default_style" style="float:right;">
<a class="btn btn-xs btn-share addthis_button_email">
<span class="playblk"><img src="//www.azlyrics.com/images/email.svg" width="56" height="18" class="playblk" alt="Email"></span>
</a>
<a class="btn btn-xs btn-share addthis_button_print" style="margin-right: 0px !important;">
<span class="playblk"><img src="//www.azlyrics.com/images/print.svg" width="56" height="18" class="playblk" alt="Print"></span>
</a>
</div>
</div>
<!-- AddThis Button END -->

<div class="div-share"><h1>"Nuh Ready Nuh Ready" lyrics</h1></div>

<div class="lyricsh">
<h2><b>Calvin Harris Lyrics</b></h2>
</div>

<div class="ringtone">
<span id="cf_text_top"></span>
</div>

<b>"Nuh Ready Nuh Ready"</b><br>
<span class="feat">(feat. PARTYNEXTDOOR)</span><br>
<br>

<div>
<!-- Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->
Mi and di mandem<br>
We haffi run from half of di gyal dem<br>
So sweet, so sweet<br>
Don't want mi children and ting'<br>
Mi nuh ready fi all dem tings<br>
So sweet, you're so sweet, yeah<br>
Yeah, mi nuh ready fi all dem things yet<br>
So sweet, so sweet, yeah<br>
Yeah, I'm not ready fi all dem tings yet<br>
I'm not ready fi all dem tings yet<br>
<br>
She call me kid, kid, kid<br>
My mama kiss her kid<br>
She say mi tooth-tooth sweet<br>
She say mi tooth-tooth sweet<br>
Don't make me feel like I love you<br>
Just 'cause I thought you was special<br>
Won't make me feel like I love you<br>
Baby, girl, I won't settle<br>
I had dreams of fuckin' the baddest bitch<br>
Last night I awoke up and I fucked the baddest bitch<br>
I thought I would be ready when I seen her<br>
When I was in the disco<br>
I gotta keep it honest<br>
Keep it real with you<br>
<br>
Mi and di mandem<br>
We haffi run from half of di gyal dem<br>
So sweet, so sweet<br>
Don't want mi children and tings<br>
Mi nuh ready fi all dem tings<br>
So sweet, you're so sweet<br>
Mi nuh ready fi all dem tings yet<br>
So sweet, so sweet<br>
Mi and di mandem<br>
We haffi run from half of di gyal dem<br>
So sweet, you're so sweet<br>
Don't want mi children and tings<br>
Mi nuh ready fi all dem tings<br>
So sweet, you're so sweet<br>
Mi nuh ready fi all dem tings<br>
So sweet, so sweet<br>
<br>
I strapped up 'cause they mapped up<br>
'Cause I need to know where you are<br>
Can't keep following these signs<br>
'Cause you're lookin' for a sign, and I can't give you one<br>
Start to feel like it's mad love<br>
That's givin' your attraction, to me<br>
Yeah, I just want you, nobody else, baby<br>
I don't wanna get too far<br>
It's just you that I want<br>
<br>
When it's mi and di mandem<br>
We haffi run from half of di gyal dem<br>
So sweet, so sweet<br>
Don't want mi children and tings<br>
Mi nuh ready fi all dem tings<br>
So sweet, you're so sweet<br>
Mi nuh ready fi all dem tings yet<br>
So sweet, so sweet<br>
Mi and di mandem<br>
We haffi run from half of di gyal dem<br>
So sweet, so sweet<br>
Don't want mi children and tings<br>
Mi nuh ready fi all dem tings<br>
So sweet, you're so sweet<br>
Mi nuh ready fi all dem tings
</div>

<br><br>

<!-- MxM banner -->
<div class="noprint">
<script>
if  ( /Android|webOS|iPhone|iPod|iPad|BlackBerry|IEMobile|Opera Mini/i.test(navigator.userAgent) ) 
  {
     document.write('<div style="margin-left: auto; margin-right: auto;">'+
  '<iframe scrolling="no" style="border: 0px none; overflow:hidden;" src="//adv.mxmcdn.net/br/t1.0/m_js/e_0/sn_0/l_17494554/su_0/tr_3vUCAOZlq_zEKGGqiwqgUipktnY4AJ8vdMlDERwd-IQW1fCzlbIik50-scymuRv_pi3wUAIxUI2AiwodRggYSWyWKe5520YE8tdDBkiBtPeafB1eU4jsrx-cHUKKrQnbpH1kEJ6cxCXNRK21S-URGe9hKl3IVQsjUfAjAGzo670kV-_NZoBHp8gEZ5eOQESUhj_qd_IMSEvXm2euf-p8Ih6vduevXpBlMcIEAKI3kCxKguw10zJEFpaF8yFsaYWxPJ04Xubjxi6nlSUBsg_Tr8m9oMC4dgrbSjSYIrAWyJz1IIVbLSkQUGxPFTsbNsL_-bnudnLQaUE_eaP3nAsOaQdHURbAr7wki_hHoAjXgZpE4VF7MLao4sJEJ4jJaHu9IhQphsYTZfU6HCHDQhcz3lF_zned3kiL-MhHIP8j0K_ktF3poJHjI5u9L-cJHNywsz-sadxqsZMdqBf1jMraRS68zUYcTR9L15oyvk54l_erv80gD-ns/" width="290px" height="50px"></iframe>'+
  '</div>');
   }
</script>
<br><br>
</div>

<form id="addsong" style="visible:hidden; margin:0;" action="../../add.php" method="post">
<input type="hidden" name="what" value="add_song">
<input type="hidden" name="artist" value="Calvin Harris">
</form>

<form action="../../add.php" method="post" id="corlyr">
<input type="hidden" name="what" value="correct_lyrics">
<input type="hidden" name="song_id" value="613870">
</form>

<div class="smt noprint">
<a class="btn btn-share" href="#" onclick="submitCorrections()"><span class="glyphicon glyphicon-pencil"></span> Submit Corrections</a>
</div>

<div class="smt"></div>

<div class="noprint" style="padding: 15px 0">
<span id="cf_text_bottom"></span>
</div>

<!-- credits -->
<div class="smt"></div>

<!-- song facts -->

<!-- artist link -->
<ol class="breadcrumb noprint" itemscope itemtype="https://schema.org/BreadcrumbList">
  <li itemprop="itemListElement" itemscope itemtype="https://schema.org/ListItem"><a itemprop="item" href="//www.azlyrics.com"><span itemprop="name">AZLyrics</span></a></li>
  <li itemprop="itemListElement" itemscope itemtype="https://schema.org/ListItem"><a itemprop="item" href="//www.azlyrics.com/c.html"><span itemprop="name">C</span></a></li>
  <li itemprop="itemListElement" itemscope itemtype="https://schema.org/ListItem"><a itemprop="item" href="//www.azlyrics.com/c/calvinharris.html"><span itemprop="name">Calvin Harris Lyrics</span></a></li>
</ol>

<!-- album songlists -->
<!-- album songlists end -->

        <form class="search noprint" method="get" action="//search.azlyrics.com/search.php" role="search">
         <div style="margin-bottom:15px" class="input-group">  
		<input type="text" class="form-control" placeholder="" name="q">
       		<span class="input-group-btn">
            	  <button class="btn btn-primary" type="submit"><span class="glyphicon glyphicon-search"></span> Search</button>
          	</span>
 	  </div>   
	</form>

<div class="noprint visible-xs-block" style="margin-top:5px;margin-bottom:5px">
<span id="cf_rect_bottom"></span>
</div>

</div> <!-- content -->

<div class="col-lg-2 text-center hidden-md hidden-sm hidden-xs noprint">
   <div class="sky-ad"></div>
</div>
</div>
</div>  <!-- container main-page -->

<!-- nav bottom -->
       <nav class="navbar navbar-default navbar-bottom">
          <div class="container text-center">
          <ul class="nav navbar-nav navbar-center">
            <li><a href="//www.azlyrics.com/add.php" onclick="document.forms['addsong'].submit();return false;">Submit Lyrics</a></li>
            <li><a href="//www.stlyrics.com">Soundtracks</a></li>
            <li><a href="//www.facebook.com/pages/AZLyricscom/154139197951223">Facebook</a></li>
            <li><a href="//www.azlyrics.com/contact.html">Contact Us</a></li>
          </ul>
          </div> 
        </nav>

<!-- bot ban -->
  <div class="lboard-wrap noprint">
  <div class="container">
    <div class="row">
       <div class="col-xs-12 top-ad text-center">
          <span id="cf_banner_bottom"></span>
       </div>
    </div>
  </div>
  </div>

<!-- footer -->
     <nav class="navbar navbar-footer noprint">
          <div class="container text-center">
          <ul class="nav navbar-nav navbar-center">
            <li><a href="//www.azlyrics.com/adv.html">Advertise Here</a></li>
            <li><a href="//www.azlyrics.com/privacy.html">Privacy Policy</a></li>
            <li><a href="//www.azlyrics.com/copyright.html">DMCA Policy</a></li>
          </ul>
          </div> 
     </nav>
     <div class="footer-wrap">
          <div class="container">
          <div class="noprint"><span style="font-weight:bold;line-height:54px;vertical-align:top;">Powered by </span><img src="//www.azlyrics.com/images/mxm.png" width="184" height="54" alt="MusixMatch"></div>
          <small>
             Calvin Harris lyrics are property and copyright of their owners. "Nuh Ready Nuh Ready" lyrics provided for educational purposes and personal use only.<br>
             <script type="text/javascript">
                curdate=new Date();
                document.write("<strong>Copyright &copy; 2000-"+curdate.getFullYear()+" AZLyrics.com<\/strong>");
             </script>
          </small>
          </div>
     </div>

<script>
cf_page_artist = ArtistName;
cf_page_song = SongName;
cf_page_genre = "pop";
</script>
<script src="//cdn.clickfuse.com/publishers/azlyrics/single.min.js"></script>

<script type="text/javascript">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-4309237-1']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

</script>
    <div id="CssFailCheck" class="hidden" style="height:1px;"></div>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.2/jquery.min.js"></script>
    <script>window.jQuery || document.write('<script src="//www.azlyrics.com/local/jquery.min.js"><\/script>')</script>
    <script>
      $(function () {
       if ($('#CssFailCheck').is(':visible') === true) {
         $('<link rel="stylesheet" type="text/css" href="//www.azlyrics.com/bs/css/bootstrap.min.css"><link rel="stylesheet" href="//www.azlyrics.com/bsaz.css">').appendTo('head');
       }
      });
    </script>
    <script src="//www.azlyrics.com/collapse.js"></script>
    <script type="text/javascript" src="https://tracking.musixmatch.com/t1.0/m_js/e_0/sn_0/l_17494554/su_0/tr_3vUCAOZlq_zEKGGqiwqgUipktnY4AJ8vdMlDERwd-IQW1fCzlbIik50-scymuRv_pi3wUAIxUI2AiwodRggYSWyWKe5520YE8tdDBkiBtPeafB1eU4jsrx-cHUKKrQnbpH1kEJ6cxCXNRK21S-URGe9hKl3IVQsjUfAjAGzo670kV-_NZoBHp8gEZ5eOQESUhj_qd_IMSEvXm2euf-p8Ih6vduevXpBlMcIEAKI3kCxKguw10zJEFpaF8yFsaYWxPJ04Xubjxi6nlSUBsg_Tr8m9oMC4dgrbSjSYIrAWyJz1IIVbLSkQUGxPFTsbNsL_-bnudnLQaUE_eaP3nAsOaQdHURbAr7wki_hHoAjXgZpE4VF7MLao4sJEJ4jJaHu9IhQphsYTZfU6HCHDQhcz3lF_zned3kiL-MhHIP8j0K_ktF3poJHjI5u9L-cJHNywsz-sadxqsZMdqBf1jMraRS68zUYcTR9L15oyvk54l_erv80gD-ns/"></script>
  </body>
</html>

我应该做些什么改变来实现上述目标?感谢

2 个答案:

答案 0 :(得分:1)

以下代码可以为您提供所需格式的歌词:

// Get the lyrics div element
Element lyricsDiv = document.select("div.main-page > div.row > div.col-xs-12").select("div").get(7);

// Get the html of the element and replace <br> and comments
String lyrics = lyricsDiv.html().replaceAll("<br>", "").replaceAll("<!--(.*?)-->", "");

答案 1 :(得分:0)

试试这个

Elements main = doc.select("div[class=container main-page]");

Elements row = main.select("div[class=row]");
Elements col =  row.select("div[class=col-xs-12 col-lg-8 text-center]");
songMetaDataTextView.setText(Html.fromHtml(col.select("div").get(7).toString());

您有嵌套标签

<div class="container main-page">
<div class="row">
    <div class="col-lg-2 text-center hidden-md hidden-sm hidden-xs noprint">
        <div class="sky-ad"></div>
    </div>

    <!-- content -->
    <div class="col-xs-12 col-lg-8 text-center">

        <div class="div-share noprint">
            <div class="fb-like" style="float:left;" data-href="https://www.azlyrics.com/lyrics/calvinharris/nuhreadynuhready.html" data-layout="button_count" data-action="like" data-show-faces="false" data-share="false"></div>
            <!-- AddThis Button BEGIN -->
            <script type="text/javascript" src="https://s7.addthis.com/js/300/addthis_widget.js#username=azlyrics"></script>
            <div class="addthis_toolbox addthis_default_style" style="float:right;">
                <a class="btn btn-xs btn-share addthis_button_email">
                    <span class="playblk"><img src="//www.azlyrics.com/images/email.svg" width="56" height="18" class="playblk" alt="Email"></span>
                </a>
                <a class="btn btn-xs btn-share addthis_button_print" style="margin-right: 0px !important;">
                    <span class="playblk"><img src="//www.azlyrics.com/images/print.svg" width="56" height="18" class="playblk" alt="Print"></span>
                </a>
            </div>
        </div>
        <!-- AddThis Button END -->

        <div class="div-share"><h1>"Nuh Ready Nuh Ready" lyrics</h1></div>

        <div class="lyricsh">
            <h2><b>Calvin Harris Lyrics</b></h2>
        </div>

        <div class="ringtone">
            <span id="cf_text_top"></span>
        </div>

        <b>"Nuh Ready Nuh Ready"</b><br>
        <span class="feat">(feat. PARTYNEXTDOOR)</span><br>
        <br>

        <div>
            <!- your lyrics her -->

首先你得到容器主页然后行,然后是col-xs-12 col-lg-8文本中心,然后最后使用索引7获取文本