Question

我必须用美丽的汤来获得iframe src

<div class="divclass">
 <div id="simpleid">
  <iframe width="300" height="300" src="http://google.com>

我可以使用selenium代码：

iframe1 = driver.find_element_by_class_name("divclass")
iframe = iframe1.find_element_by_tag_name("iframe").get_attribute("src")

但是硒对于这项任务来说太慢了。

我一直在寻找stackoverflow上的解决方案并尝试了几个代码，但在使用urllib时总是得到错误403（更改浏览器代理无法正常工作，仍然是403错误）或者我得到“无”

Answer 1

非常好的问题。查看您尝试使用该lib获取iframe的网站，您必须获取该div中标记的内容，然后base64解码它，您应该完成。看看你是怎么做的，不要停下来！你将成为一名出色的程序员。

Answer 2

使用soup.find_all（＆＃39;您要搜索的标签＆＃39;）

>>> from bs4 import BeautifulSoup
>>> html = '''
... <div class="divclass">
...  <div id="simpleid">
...   <iframe width="300" height="300" src="http://google.com">
... '''
>>> soup = BeautifulSoup(html, 'html.parser')
>>> soup.find_all('iframe')
[<iframe height="300" src="http://google.com" width="300">
</iframe>]
>>> soup.find_all('iframe')[0]['src']
u'http://google.com'
>>>

BeautifulSoup定位iframe及其属性

2 个答案: