写一个在YouTube上搜索短语的程序的一部分,然后我希望它获取第一个视频的网址。但我不知道如何获得第一个视频的网址
这是我的代码:
import urllib2, urllib
raw_i=raw_input("Search: ")
x = urllib.quote_plus(raw_i)
site1 = urllib2.urlopen('http://www.youtube.com/results?search_query=%s'%x)
y = site1.read()
这会读取搜索页面,但我希望它只返回视频的网址
例如,让我们使用短语“Harry Nilsson的椰子”
这是第一个视频的HTML
<li class="yt-lockup2 clearfix yt-uix-tile result-item-padding has-hover-effects yt- lockup2-video yt-lockup2-tile context-data-item" data-context-item-title="Harry Nilsson - Coconut (1971)" data-context-item-views="2,930,881 views" data-context-item-type="video" data-context-item-id="Tbgv8PkO9eo" data-context-item-time="4:32" data-context-item- user="Zoltán Makk">
<div class="yt-lockup2-thumbnail">
<a href="/watch?v=Tbgv8PkO9eo" class="ux-thumb-wrap yt-uix-sessionlink yt-uix- contextlink contains-addto " data-sessionlink="ved=CDIQwBs&ei=prWOUZT9KIK8igLtyICAAQ"> <span class="video-thumb yt-thumb yt-thumb-185" >
<span class="yt-thumb-default">
<span class="yt-thumb-clip">
<span class="yt-thumb-clip-inner">
<img alt="Thumbnail" src="//i1.ytimg.com/vi/Tbgv8PkO9eo/mqdefault.jpg" width="185" >
<span class="vertical-align"></span>
</span>
</span>
</span>
</span>
<span class="video-time">4:32</span>
我希望只返回"/watch?v=Tbgv8PkO9eo"
谢谢!
答案 0 :(得分:1)
You can use HTMLParser
。创建自己的派生自Python类的解析器。
导入HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
# Only parse the 'anchor' tag.
if tag == "a":
# Check the list of defined attributes.
for name, value in attrs:
# If href is defined, print it.
if name == "href":
print name, "=", value
您使用html字符串创建解析器和feed
。
your_html_string='<li class="yt-lockup2 clearfix yt-uix-tile result-item- \
padding has-hover-effects yt-lockup2-video yt-lockup2-tile \
context-data-item" data-context-item-title="Harry Nilsson - \
Coconut (1971)" data-context-item-views="2,930,881 views" \
data-context-item-type="video" data-context-item- \
id="Tbgv8PkO9eo" data-context-item-time="4:32" \
data-context-item-user="Zoltán Makk">\
<div class="yt-lockup2-thumbnail">\
<a href="/watch?v=Tbgv8PkO9eo" class="ux-thumb-wrap \
yt-uix-sessionlink yt-uix-contextlink contains-addto" data-\
sessionlink="ved=CDIQwBs&ei=prWOUZT9KIK8igLtyICAAQ">\
<span class="video-thumb yt-thumb yt-thumb-185" >\
<span class="yt-thumb-default"> \
<span class="yt-thumb-clip" \
<span class="yt-thumb-clip-inner"> \
<img alt="Thumbnail" \
src="//i1.ytimg.com/vi/Tbgv8PkO9eo/mqdefault.jpg" \
width="185" > <span class="vertical-align"></span> \
</span> </span></span></span> \
<span class="video-time">4:32</span>'
parser = MyHTMLParser()
parser.feed(your_html_string)
结果是
>>>
href = /watch?v=Tbgv8PkO9eo