我正在尝试使用BeautifulSoup和lxml解析器提取页面中嵌入的视频URL。我搜索了很多以获得正确的PYTHON代码,但是在很多天之后还没有成功。到目前为止我的代码如下 -
import re
from bs4 import BeautifulSoup
import requests
url = 'http://telly-loans.com/watchvideo.php?id=0kw7cgyat4p7'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}
with requests.Session() as session:
session.headers = headers
response = session.get(url)
soup = BeautifulSoup(response.content, "lxml")
response = session.get(soup.iframe['src'], headers={'Referer': url})
soup = BeautifulSoup(response.content, "lxml")
print re.search(r'http:\/\/"(.*?)"', soup.script.text).group(1)
我的python经验非常有限,必须在这里遗漏一些东西。有没有人尝试过类似的页面,可以帮助我完成这项工作。
PAGE网址 - http://telly-loans.com/watchvideo.php?id=0kw7cgyat4p7
部分带有IFRAME的HTML
<td>
<div id='div-gpt-ad-1466152083933-1' style='height:600px; width:160px;'>
<script type='text/javascript'>googletag.cmd.push(function() { googletag.display('div-gpt-ad-1466152083933- 1'); });
</script>
</div><!-- END TAG --
</td>
<td>
<IFRAME SRC="http://watchvideo2.us/embed-0kw7cgyat4p7-540x304.html" FRAMEBORDER=0 MARGINWIDTH=0 MARGINHEIGHT=0 SCROLLING=NO WIDTH=540 HEIGHT=304> </IFRAME>
</td>