我正试图在div类中搜索一个儿子数据,我试图获取" url"的数据。我使用video_link = self.soup.find('div' ,{'class':'video-embed-big'})
,但我无法使用引用的网址获取该div中的数据。
<div class="video-embed-big video-embed-area bf_dom" id="video_buzz_element_4154403_7994283" rel:thumb="https://img.youtube.com/vi/_Ym0LW_uPPk/2.jpg" rel:bf_bucket_data="{"video": {"size": "big", "width":"625", "height":"376", "url":"https://youtube.com/watch?v=_Ym0LW_uPPk", "id":"4154403_7994283"}}">
<div style="position:relative;" id="video_wrapper_4154403_7994283">
<iframe id="yt_4154403_7994283" class="ytvideo" type="text/html" allowscriptaccess="always" allowfullscreen="true" width="625" height="376" src="https://www.youtube.com/embed/_Ym0LW_uPPk?version=3&hl=en&fs=1&enablejsapi=1&origin=http://www.buzzfeed.com&autoplay=0&showinfo=0&wmode=opaque" frameborder="0">
</iframe>
</div>
</div>
答案 0 :(得分:1)
怎么样
video_div = self.soup.find('div', id=lambda d: d and d.startswith('video_wrapper_'))
video_link = video_div.find('iframe')['src']
将返回
In [5]: video_link
Out[5]: 'https://www.youtube.com/embed/_Ym0LW_uPPk?version=3&hl=en&fs=1&enablejsapi=1&origin=http://www.buzzfeed.com&autoplay=0&showinfo=0&wmode=opaque'
如果您想使用urlparse
并获取实际的YouTube页面,可以更深入一点。
import urlparse
video_div = self.soup.find('div', id=lambda d: d and d.startswith('video_wrapper_'))
video_link = video_div.find('iframe')['src']
url = urlparse.urlparse(video_link)
youtube_url = urlparse.urlunparse((url[0], url[1], "watch?v=" + url[2].split('/')[2],'','',''))
这是youtube_url
In [15]: urlunparse((url[0], url[1], "watch?v=" + url[2].split('/')[2],'','',''))
Out[15]: 'https://www.youtube.com/watch?v=_Ym0LW_uPPk'
答案 1 :(得分:0)
video_link = self.soup.find('div',{'class':'video-embed-big'}).div.iframe['src']
您需要使用&#34;。&#34;运算符进入div的子进程然后使用src属性获取url。