我有这个脚本:
var s1 = new SWFObject('/media/player/flvplayer.swf','single','400','300','7');s1.addParam('allowfullscreen','true');s1.addVariable('file','http://cdn.abc.con/video.flv');s1.addParam('menu','false');s1.addVariable('width','400');s1.addVariable('height','300');s1.write('player1474719921904');
我将获得视频网址值:
http://cdn.abc.con/video.flv
我试过这个,但是找不到:
scripts = soup.find_all("script")
if scripts:
for s in scripts:
crawler_logger.info('s: %s' % s)
l = s.find_all(attrs={'': re.compile(r'\.(flv|mp4)$')})
我希望能够获得这样的所有视频,而无需知道网址名称
答案 0 :(得分:1)
BeautifulSoup不解析javascript。从脚本标记s
中,将javascript代码解压缩为:
code = s.text
然后您可以使用正则表达式手动提取URL,如下所示:
import re
code = """var s1 = new SWFObject('/media/player/flvplayer.swf','single','400','300','7');s1.addParam('allowfullscreen','true');s1.addVariable('file','http://cdn.abc.con/video.flv');s1.addParam('menu','false');s1.addVariable('width','400');s1.addVariable('height','300');s1.write('player1474719921904');"""
url = re.search(r"['\"](https?://.+?\.flv)['\"]", code).group(1)
print(url) # http://cdn.abc.con/video.flv
答案 1 :(得分:1)
import re
text = '''
var s1 = new SWFObject('/media/player/flvplayer.swf','single','400','300','7');s1.addParam('allowfullscreen','true');s1.addVariable('file','http://cdn.abc.con/video.flv');s1.addParam('menu','false');s1.addVariable('width','400');s1.addVariable('height','300');s1.write('player1474719921904');
var s1 = new SWFObject('/media/player/flvplayer.swf','single','400','300','7');s1.addParam('allowfullscreen','true');s1.addVariable('file','http://cdn.abc.con/video.flv');s1.addParam('menu','false');s1.addVariable('width','400');s1.addVariable('height','300');s1.write('player1474719921904');
var s1 = new SWFObject('/media/player/flvplayer.swf','single','400','300','7');s1.addParam('allowfullscreen','true');s1.addVariable('file','http://cdn.abc.con/video.flv');s1.addParam('menu','false');s1.addVariable('width','400');s1.addVariable('height','300');s1.write('player1474719921904');
'''
link = re.findall(r"'(http.+?)'", text)
print(link)
出:
['http://cdn.abc.con/video.flv', 'http://cdn.abc.con/video.flv', 'http://cdn.abc.con/video.flv']
这个正则表达式将找到所有链接,并将它们放在列表中