我想从网页中抓取视频,但是该页面中有两个 iframe 标签。 一个用于显示Facebook页面,另一个用于嵌入视频。 我只想从中获取视频网址。 但是,当我尝试抓取时,我得到了所有iframe。
赞:
url_videos = requests.get(link_to_video)
video_link = BeautifulSoup(url_videos.text, 'html.parser')
video_on_iframe = video_link.find('iframe')
print(video_on_iframe)
当我尝试运行上面的代码时,我得到了以下结果:
<iframe allow="encrypted-media" allowtransparency="true" frameborder="0" height="80" scrolling="no" src="https://www.facebook.com/plugins/page.php?href=https%3A%2F%2Fwww.facebook.com%2FAnimeindoFans%2F&tabs&width=280&height=180&small_header=true&adapt_container_width=true&hide_cover=true&show_facepile=false&appId=123434497681677" style="border:none;overflow:hidden" width="280"></iframe>
<iframe allow="encrypted-media" allowtransparency="true" frameborder="0" height="80" scrolling="no" src="https://www.facebook.com/plugins/page.php?href=https%3A%2F%2Fwww.facebook.com%2FAnimeindoFans%2F&tabs&width=280&height=180&small_header=true&adapt_container_width=true&hide_cover=true&show_facepile=false&appId=123434497681677" style="border:none;overflow:hidden" width="280"></iframe>
<iframe allow="encrypted-media" allowtransparency="true" frameborder="0" height="80" scrolling="no" src="https://www.facebook.com/plugins/page.php?href=https%3A%2F%2Fwww.facebook.com%2FAnimeindoFans%2F&tabs&width=280&height=180&small_header=true&adapt_container_width=true&hide_cover=true&show_facepile=false&appId=123434497681677" style="border:none;overflow:hidden" width="280"></iframe>
<iframe frameborder="0" height="380" scrolling="no" src="http://www.mp4upload.com/embed-q7xxgge1yu1c.html" type="text/html" width="640">
</iframe>
<iframe allow="encrypted-media" allowtransparency="true" frameborder="0" height="80" scrolling="no" src="https://www.facebook.com/plugins/page.php?href=https%3A%2F%2Fwww.facebook.com%2FAnimeindoFans%2F&tabs&width=280&height=180&small_header=true&adapt_container_width=true&hide_cover=true&show_facepile=false&appId=123434497681677" style="border:none;overflow:hidden" width="280"></iframe>
<iframe allow="encrypted-media" allowtransparency="true" frameborder="0" height="80" scrolling="no" src="https://www.facebook.com/plugins/page.php?href=https%3A%2F%2Fwww.facebook.com%2FAnimeindoFans%2F&tabs&width=280&height=180&small_header=true&adapt_container_width=true&hide_cover=true&show_facepile=false&appId=123434497681677" style="border:none;overflow:hidden" width="280"></iframe>
我不需要Facebook iframe ,我只需要来自其他 iframe 的视频URL,其属性为height="380"
和width="280"
>
当我尝试在像这样的 find()方法中指定更多详细信息时:
video_on_iframe = video_link.find('iframe', width=640, height=380)
我明白了:
None
None
None
<iframe frameborder="0" height="380" scrolling="no" src="http://www.mp4upload.com/embed-q7xxgge1yu1c.html" type="text/html" width="640">
</iframe>
None
None
iframe 元素,其他元素无。
所以..我的问题是如何只找到所有iframe', width=640, height=380
的值而跳过其他{.1}}的结果..
答案 0 :(得分:0)
您还可以要求提供rm
属性:
src
或者,结合检查video_on_iframe = video_link.find('iframe', src=True)
和width
:
height
答案 1 :(得分:0)
您可以使用find_all
查找具有该尺寸和src属性的所有视频。
video_on_iframe = [video["src"] for video in video_link.find_all('iframe', width=640,
height=380, src=True)]
print(video_on_iframe)
[u'http://www.mp4upload.com/embed-q7xxgge1yu1c.html'] [以0.2秒完成]
答案 2 :(得分:0)
video_on_frame = video_link.find_all('iframe', height = '380')## This means I wanna scrap iframe who has height value 380 . You can also use widht. link_array = [] for link in video_on_frame: ## Your html has 1 iframe in video_on_frame format. get_iframe_url = link['src'] ## find iframe's src try: link_array.append(get_iframe_url) ## add src into a array except: link_array.append('Error')
print(link_array)将显示您想要的网址