Question

我正在尝试制作一个字幕下载程序，它取名文件夹中的所有文件的名称，并在网站上搜索“Subscene.com＆＃39;”。我可以使用漂亮的汤来废弃HTML源代码，但我无法从HTML源代码获取zip文件的链接。点击“下载按钮”即可触发下载。

The higlighted text is the link which the download button redirects to. But how i am supposed to download the file using this link

没有这样的显式链接可以下载zip文件。无论如何都有解决这个问题的方法吗？

Answer 1

您不需要任何明确的下载zip文件链接

这是我用于python下载器脚本的逻辑

MyFile2= urllib2.urlopen( Your Url ) # input link

MyHtml2 = MyFile2.read()

soup2 = BeautifulSoup(MyHtml2,"lxml")

downloaddiv= soup2.find("div", {"class": "download"}) #finding div class for the link

downloadlink = downloaddiv.find('a') #finding url from div class

download = 'https://subscene.com/'+downloadlink['href'] #appending the above url to main domain for genrating download

r = requests.get(download) # Request for downloading

z = zipfile.ZipFile(io.BytesIO(r.content)) # Opening zip file

z.extractall() # extracting zip file

您还需要这些头文件

import zipfile
from bs4 import BeautifulSoup
import urllib2
import lxml.html
from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen
import requests ,io

希望你理解一切正确!!

如何从诸如Subscene.com＆＃39;等网站下载srt文件。使用python（BeautifulSoup）

1 个答案: