我有< iframes>
的嵌套列表:
iframes = [
 [< iframe数据懒惰-SRC =“https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/309819830&color=ff5500&auto_play=false& ; hide_related = false& amp; show_comments = true& amp; amp_user = true& amp; show_reposts = false“frameborder =”no“height =”166“scrolling =”no“src =”data:image / gif; base64,R0lGODdhAQABAPAAAP /// wAAACwAAAAAAQABAEACAkQBADs =“width =”100%“>< / iframe>,< iframe allowtransparency =”true“data-lazy-src =”// www.facebook.com/plugins/likebox.php?href = HTTPS%3A%2F%2Fwww.facebook.com%2FPauseMusicale&放大器;放大器;宽度= 300&放大器;放大器;高度= 62&放大器;放大器; show_faces =假放大器;放大器;色彩方案=光&放大器;放大器;流=假放大器;放大器; show_border =假安培; amp; header = false“frameborder =”0“scrolling =”no“src =”data:image / gif; base64,R0lGODdhAQABAPAAAP /// wAAACwAAAAAAQABAEACAkQBADs =“style =”border:none; overflow:hidden; width:300px;高度:62px;“>< / iframe中> ,< iframe allowfullscreen =“”data-lazy-src =“// www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1”frameborder =“0”height =“169”src =“data:image / gif; base64,R0lGODdhAQABAPAAAP /// wAAACwAAAAAAQABAEACAkQBADs =“width =”100%“>< / iframe>],[< iframe data-lazy-src =”https://w.soundcloud.com/player/?url= HTTPS%3A // api.soundcloud.com /音轨/ 310079005&放大器;放大器;颜色= ff5500&放大器;放大器; auto_play =假放大器;放大器; hide_related =假放大器;放大器; show_comments =真放大器;放大器; show_user =真放大器;放大器; show_reposts =假“frameborder =”no“height =”166“scrolling =”no“src =”data:image / gif; base64,R0lGODdhAQABAPAAAP /// wAAACwAAAAAAQABAEACAkQBADs =“width =”100%“>< / iframe>,< iframe allowtransparency =“true”data-lazy-src =“// www.facebook.com/plugins/likebox.php?href=https%3A%2F%2Fwww.facebook.com%2FPauseMusicale&width=300& ; height = 62& amp; show_faces = false& amp; colorscheme = light& amp; stream = false& amp; amp_border = false& amp; header = false“frameborder =”0“scrolli ng =“no”src =“data:image / gif; base64,R0lGODdhAQABAPAAAP /// wAAACwAAAAAAQABAEACAkQBADs =”style =“border:none;溢出:隐藏;宽度:300像素;高度:62px;“>< / iframe>,< iframe allowfullscreen =”“data-lazy-src =”// www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1“frameborder =”0“height = “169”src =“data:image / gif; base64,R0lGODdhAQABAPAAAP /// wAAACwAAAAAAQABAEACAkQBADs =”width =“100%”>< / iframe>],
 [< iframe etc],&#xA ; [< iframe etc]]

&#xA;&#xA; 我希望获取所有 ['data-lazy-src'] < / code> from it。
我正在使用此代码:
&#xA;&#xA; for iframe中的iframe:&#xA; for iframe中的i:&#xA; scheme,netloc,path,params,query,fragment = urlparse(i.attrs ['data-lazy-src'])&#xA; if if scheme:&#xA; scheme ='http'&#xA; url = urlunparse((scheme,netloc,path,params,query,fragment))&#xA; print('Fetching {}'。format(url)) &#xA; f = urllib2.urlopen(url)&#xA;
&#xA;&#xA; 但我得到了:
&#xA;& #xA; 获取http://www.youtube.com/embed/video series?list = PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1&#xA;获取http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1
获取http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1
&#xA;&#xA; 我知道我错过了一些非常明显的东西,但我看不到它。
&#xA;&#xA;有人可以帮我吗?
&#xA;答案 0 :(得分:1)
您可以从iframes
获取html字符串,然后将其传递给BeautifulSoup以便轻松解析。尝试这样的事情。
from bs4 import BeautifulSoup
iframe = '<iframe data-lazy-src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/309819830..." frameborder="no"></iframe>'
soup = BeautifulSoup(iframe, 'html.parser')
tag = soup.find_all('iframe')[0]
print(tag['data-lazy-src'])
答案 1 :(得分:0)
问题在于生成嵌套列表的方式,将soup.find_all('iframe')
追加到iframes = []
。
删除附加部分后,其工作方式如下:
(...)
iframes = soup.find_all('iframe')
for iframe in iframes:
scheme, netloc, path, params, query, fragment = urlparse(iframe.attrs['data-lazy-src'])
if not scheme:
scheme = 'http' # default scheme you used when getting the current page
url = urlunparse((scheme, netloc, path, params, query, fragment))
print('Fetching {}'.format(url))
f = urllib2.urlopen(url)
结果:
Fetching https://www.youtube.com/embed/OWr5FawT2Ks?rel=0
Fetching https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/308112514&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false
Fetching http://www.facebook.com/plugins/likebox.php?href=https%3A%2F%2Fwww.facebook.com%2FPauseMusicale&width=300&height=62&show_faces=false&colorscheme=light&stream=false&show_border=false&header=false
Fetching http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1