为什么我的CSS选择器不起作用?

时间:2015-06-17 17:09:58

标签: python css-selectors beautifulsoup robobrowser

为什么这段代码不起作用?它应该选择" div.col-xs-12.col-lg-8.text-center"的第6个div标签。其中包括歌词,但它没有。顺便说一句,添加了stage thingys以便于编程 简而言之,CSS选择器是这样的: body > div.container.main-page > div > div.col-xs-12.col-lg-8.text-center > div:nth-of-type(6) 该网站是这样的:http://www.azlyrics.com/lyrics/queen/bohemianrhapsody.html

import re
from robobrowser import RoboBrowser
browser = RoboBrowser(history=True)
browser.open('http://www.azlyrics.com/')
print('Stage A')
form = browser.get_form(role = 'search')
form['q'].value = 'Bohemian Rhapsody'
browser.submit_form(form)
print('Stage B')
songs = browser.select('.text-left a')
print('Stage C')
browser.follow_link(songs[2])
lyrics = browser.select('body > div.container.main-page > div > div.col-xs-12.col-lg-8.text-center > div:nth-of-type(6)')
print(lyrics)
print('Stage D')
browser.back()

我可以轻松访问div class="col-xs-12 col-lg-8 text-center"并显示代码。问题是如何选择div class="col-xs-12 col-lg-8 text-center"内的第6个div。我正在显示以下div class="col-xs-12 col-lg-8 text-center"代码:



<div class="col-xs-12 col-lg-8 text-center">

<div class="div-share noprint">
<div class="fb-like fb_iframe_widget" style="float:left;" data-href="http://www.azlyrics.com/lyrics/queen/bohemianrhapsody.html" data-layout="button_count" data-action="like" data-show-faces="false" data-share="false" fb-xfbml-state="rendered" fb-iframe-plugin-query="action=like&amp;app_id=&amp;container_width=0&amp;href=http%3A%2F%2Fwww.azlyrics.com%2Flyrics%2Fqueen%2Fbohemianrhapsody.html&amp;layout=button_count&amp;locale=en_US&amp;sdk=joey&amp;share=false&amp;show_faces=false"><span style="vertical-align: bottom; width: 82px; height: 20px;"><iframe name="f3126b60cc" width="1000px" height="1000px" frameborder="0" allowtransparency="true" allowfullscreen="true" scrolling="no" title="fb:like Facebook Social Plugin" src="http://www.facebook.com/v2.3/plugins/like.php?action=like&amp;app_id=&amp;channel=http%3A%2F%2Fstatic.ak.facebook.com%2Fconnect%2Fxd_arbiter%2F1ldYU13brY_.js%3Fversion%3D41%23cb%3Dfca2f95d%26domain%3Dwww.azlyrics.com%26origin%3Dhttp%253A%252F%252Fwww.azlyrics.com%252Ff338c67548%26relation%3Dparent.parent&amp;container_width=0&amp;href=http%3A%2F%2Fwww.azlyrics.com%2Flyrics%2Fqueen%2Fbohemianrhapsody.html&amp;layout=button_count&amp;locale=en_US&amp;sdk=joey&amp;share=false&amp;show_faces=false" style="border: none; visibility: visible; width: 82px; height: 20px;" class=""></iframe></span></div>
<!-- AddThis Button BEGIN -->
<script type="text/javascript" src="http://s7.addthis.com/js/300/addthis_widget.js#username=azlyrics"></script>
<div class="addthis_toolbox addthis_default_style" style="float:right;">
<a class="btn btn-xs btn-share" href="http://www.amazon.com/gp/search?ie=UTF8&amp;keywords=QUEEN+Bohemian+Rhapsody&amp;tag=azlyricsunive-20&amp;index=digital-music&amp;linkCode=ur2&amp;camp=1789&amp;creative=9325" title="Get MP3!" style="float:left;">
<span class="playblk"><img src="http://images.azlyrics.com/play.svg" width="16" height="16" class="playblk" alt="MP3"></span><span>MP3</span>
</a>
<a class="btn btn-xs btn-share addthis_button_email at300b" target="_blank" title="Email" href="#"><span class="at4-icon aticon-email" style="background-color: rgb(115, 138, 141);"><span>Share on email</span></span>Email</a>
<a class="btn btn-xs btn-share addthis_button_print at300b" style="margin-right: 0px !important;" title="Print" href="#"><span class="at4-icon aticon-print" style="background-color: rgb(115, 138, 141);"><span>Share on print</span></span>Print</a>
<div class="atclear"></div></div>
</div>
<!-- AddThis Button END -->

<div class="div-share"><h1>"Bohemian Rhapsody" lyrics</h1></div>

<!-- JANGO PLAYER -->
<div class="noprint hidden-xs" style="position:relative; display:block; clear:both; height:38px; margin-bottom:10px">
<iframe scrolling="no" style="height: 38px; width: 100%; max-width: 325px; border: 0px none; overflow: hidden; display: none !important;" src="http://jmn.jangonetwork.com/az?cust_params=j_artist=QUEEN&amp;j_title=Bohemian%20Rhapsody"></iframe>
</div>
<!-- END OF JANGO PLAYER -->

<div class="lyricsh">
<h2><b>QUEEN LYRICS</b></h2>
</div>

<div class="ringtone">
<span id="cf_text_top"></span>
</div>

<b>"Bohemian Rhapsody"</b><br>
<br>

<div>
<!-- Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->
Is this the real life?<br>
Is this just fantasy?<br>
Caught in a landslide,<br>
No escape from reality.<br>
<br>
Open your eyes,<br>
Look up to the skies and see,<br>
I'm just a poor boy, I need no sympathy,<br>
Because I'm easy come, easy go,<br>
Little high, little low,<br>
Anyway the wind blows doesn't really matter to me, to me.<br>
<br>
Mama, just killed a man,<br>
Put a gun against his head,<br>
Pulled my trigger, now he's dead.<br>
Mama, life had just begun,<br>
But now I've gone and thrown it all away.<br>
<br>
Mama, ooh,<br>
Didn't mean to make you cry,<br>
If I'm not back again this time tomorrow,<br>
Carry on, carry on as if nothing really matters.<br>
<br>
Too late, my time has come,<br>
Sent shivers down my spine,<br>
Body's aching all the time.<br>
Goodbye, everybody, I've got to go,<br>
Gotta leave you all behind and face the truth.<br>
<br>
Mama, ooh (anyway the wind blows),<br>
I don't wanna die,<br>
I sometimes wish I'd never been born at all.<br>
<br>
I see a little silhouetto of a man,<br>
Scaramouche, Scaramouche, will you do the Fandango?<br>
Thunderbolt and lightning,<br>
Very, very frightening me.<br>
(Galileo) Galileo.<br>
(Galileo) Galileo,<br>
Galileo Figaro<br>
Magnifico.<br>
<br>
I'm just a poor boy, nobody loves me.<br>
He's just a poor boy from a poor family,<br>
Spare him his life from this monstrosity.<br>
<br>
Easy come, easy go, will you let me go?<br>
Bismillah! No, we will not let you go. (Let him go!)<br>
Bismillah! We will not let you go. (Let him go!)<br>
Bismillah! We will not let you go. (Let me go!)<br>
Will not let you go. (Let me go!)<br>
Never, never let you go<br>
Never let me go, oh.<br>
No, no, no, no, no, no, no.<br>
Oh, mama mia, mama mia (Mama mia, let me go.)<br>
Beelzebub has a devil put aside for me, for me, for me.<br>
<br>
So you think you can stone me and spit in my eye?<br>
So you think you can love me and leave me to die?<br>
Oh, baby, can't do this to me, baby,<br>
Just gotta get out, just gotta get right outta here.<br>
<br>
(Oh, yeah, oh yeah)<br>
<br>
Nothing really matters,<br>
Anyone can see,<br>
Nothing really matters,<br>
Nothing really matters to me.<br>
<br>
Anyway the wind blows.
</div>

<br><br>

<!-- MxM banner -->
<script>
if  ( /Android|webOS|iPhone|iPod|iPad|BlackBerry|IEMobile|Opera Mini/i.test(navigator.userAgent) ) 
  {
     document.write('<div style="margin: 20px auto">'+
  '<iframe scrolling="no" style="border: 0px none; overflow:hidden;" src="http://adv.mxmcdn.net/br/t1.0/8/0ewVDxJzLBmObjAzooreVTvouJG9VLr6frPG77XttrKDpbSkCFfJAB0DyrqOAne/BirR3yuKYK417vTAcAAPFNUwatSKPbH12fnwBpqJqdXxBpAGKv+qKXu3RJOMHGCTN4hKCzCeEaP+OWFH9e7rR8ZAFFK2u6TY7JGkp7ZiZMxjVcDI9nFG1V314osDrWYjh0ja91KlV1U2iKKkixrqtJYdUnReuvnfDfX9F8rkUiyhLV5lJb7qfK5e6C+6a9XWGFBr8/b17svT5niWaovxKaNkZ7a/BcquEtft13gs6qAzbKt8vU7XqLjFROtTlyVj7m9Xunf3REczDI4y7lCAdqlbL1uQyK2qRSk/ADDJJvd6SRbH33bd//rkBYG1ju/" width="100%" height="140"></iframe>'+
  '</div>');
   }
</script>

<form id="addsong" style="visible:hidden; margin:0;" action="../../add.php" method="post">
<input type="hidden" name="what" value="add_song">
<input type="hidden" name="artist" value="QUEEN">
</form>

<form action="../../add.php" method="post" id="corlyr">
<input type="hidden" name="what" value="correct_lyrics">
<input type="hidden" name="song_id" value="86770">
</form>

<div class="smt noprint">
<a class="btn btn-share" href="#" onclick="document.getElementById('corlyr').submit();return false;"><span class="glyphicon glyphicon-pencil"></span> Submit Corrections</a>
</div>

<!--googleoff: index-->

<!-- start of lyrics -->
<div class="hidden">Visit www.azlyrics.com for these lyrics.</div>
<!-- end of lyrics -->

<!--googleon: index-->

<div class="smt"><small>Thanks to Amar Kalita, Taylor Scheuerman, Destiney, Lina, Amanda for correcting these lyrics.<br></small>
</div>

<!-- JANGO PLAYER -->
<div class="noprint visible-xs-block" style="position:relative; display:block; clear:both; height:38px; margin-bottom:10px">
<iframe scrolling="no" style="height: 38px; width: 100%; max-width: 325px; border: 0px none; overflow: hidden; display: none !important;" src="http://jmn.jangonetwork.com/az?cust_params=j_artist=QUEEN&amp;j_title=Bohemian%20Rhapsody"></iframe>
</div>
<!-- END OF JANGO PLAYER -->

<!-- credits -->
<div class="smt"><small>Writer(s): Freddie Mercury<br>
Copyright: Queen Music Limited</small>
<br>
</div>

<!-- artist link -->
<ol class="breadcrumb noprint">
  <li><a href="http://www.azlyrics.com">A-Z Lyrics</a></li>
  <li><a href="http://www.azlyrics.com/q.html">Q</a></li>
  <li><a href="http://www.azlyrics.com/q/queen.html">QUEEN Lyrics</a></li>
</ol>

<!-- album songlists -->
<div class="panel album-panel noprint">
  <span class="glyphicon glyphicon-cd" style="margin-right:5px;"></span><a href="#8269" data-toggle="collapse">"A Night At The Opera" (1975)</a>
</div>

<div class="collapse noprint" id="8269">
<div class="panel songlist-panel">
<a href="deathontwolegs.html">Death On Two Legs</a><br>
<a href="lazingonasundayafternoon.html">Lazing On A Sunday Afternoon</a><br>
<a href="iminlovewithmycar.html">I'm In Love With My Car</a><br>
<a href="youremybestfriend.html">You're My Best Friend</a><br>
<a href="39.html">'39</a><br>
<a href="sweetlady.html">Sweet Lady</a><br>
<a href="seasiderendezvous.html">Seaside Rendezvous</a><br>
<a href="theprophetssong.html">The Prophet's Song</a><br>
<a href="loveofmylife.html">Love Of My Life</a><br>
<a href="goodcompany.html">Good Company</a><br>
<a href="bohemianrhapsody.html">Bohemian Rhapsody</a><br>
</div>
</div>

<!-- album songlists end -->

        <form class="search" method="get" action="http://search.azlyrics.com/search.php" role="search">
         <div style="margin-bottom:15px" class="input-group">  
		<input type="text" class="form-control" placeholder="" name="q">
       		<span class="input-group-btn">
            	  <button class="btn btn-primary" type="submit"><span class="glyphicon glyphicon-search"></span> Search</button>
          	</span>
 	  </div>   
	</form>

<!-- tickets -->

<!-- tickets end -->

</div>
&#13;
&#13;
&#13;

1 个答案:

答案 0 :(得分:-1)

这个网站不想要报废,所以你的工作会更糟,因为它没有识别div和其他元素,除非是严格必要的。

所以有几种方法:

  1. 最短的方式,依赖于div.ringtone紧接在包含歌词的div之前放置:

    lyrics = browser.select('div.ringtone ~ div')[0].text.strip()
    
  2. 使用HTML的顺序/结构:

    lyrics = browser.select('body > div:nth-of-type(3) > div:nth-of-type(1) > div:nth-of-type(2) > div:nth-of-type(6)')[0].text.strip()
    
  3. 遍历所有div并检查其内部HTML是否以<!-- Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->开头。

  4. 你的方式也适用于我的测试。

  5. 似乎问题不在你的选择器(check my output)。您可以通过检查HTML响应来验证网站是否阻止了您:使用browser.parse打印源代码。