我正在尝试创建一个脚本来从一个特定网站下载字幕。请阅读代码中的注释。 这是代码:
import requests
from bs4 import BeautifulSoup
count = 0
usearch = input("Movie Name? : ")
search_url = "https://www.yifysubtitles.com/search?q="+usearch
base_url = "https://www.yifysubtitles.com"
print(search_url)
resp = requests.get(search_url)
soup = BeautifulSoup(resp.content, 'lxml')
for link in soup.find_all("div",{"class": "media-body"}): #Get the exact class:'media-body'
imdb = link.find('a')['href'] #Find the link in that class, which is the exact link we want
movie_url = base_url+imdb #Merge the result with base string to navigate to the movie page
print("Movie URL : {}".format(movie_url)) #Print the URL just to check.. :p
next_page = requests.get(movie_url) #Soup number 2 begins here, after navigating to the movie page
soup2 = BeautifulSoup(next_page.content,'lxml')
#print(soup2.prettify())
for links in soup2.find_all("tr",{"class": "high-rating"}): #Navigate to subtitle options with class as high-rating
for flags in links.find("td", {"class": "flag-cell"}): #Look for all the flags of subtitles with high-ratings
if flags.text == "English": #If flag is set to English then get the download link
print("After if : {}".format(links))
for dlink in links.find("td",{"class": "download-cell"}): #Once English check is done, navigate to the download class "download-cell" where the download href exists
half_dlink = dlink.find('a')['href'] #STUCK HERE!!!HERE'S THE PROBLEM!!! SOS!!! HELP!!!
download = base_url + half_dlink
print(download)
我收到以下错误:
File "C:/Users/PycharmProjects/WhatsApp_API/SubtitleDownloader.py", line 24, in <module>
for x in dlink.find("a"):
TypeError: 'NoneType' object is not iterable
答案 0 :(得分:1)
只需更改以上行:
SELECT TOP 15
到此:
SELECT TOP 5
因为您在单个元素而不是列表上运行循环。
注意:唯一的区别是find_all()返回包含单个结果的列表,而find()只返回结果。
希望这会对你有所帮助! :)
答案 1 :(得分:0)
查看find_all()
和find()
的文档。
find_all()
:
find_all()
方法查看标记的后代,并检索与您的过滤条件匹配的所有后代。
find
:
find_all()
方法扫描整个文档以查找结果,但有时您只想查找一个结果。如果你知道一个 文档只有一个<body>
标签,扫描时只是浪费时间 整个文件寻找更多。而不是传入limit=1
每次拨打find_all
时,都可以使用find()
方法。
因此,您不需要循环find()
函数来获取标记。您需要在代码中进行以下更改(删除不必要的for
循环):
...
# Previous code is the same
soup2 = BeautifulSoup(next_page.content,'lxml')
for links in soup2.find_all("tr",{"class": "high-rating"}):
if links.find("td", {"class": "flag-cell"}).text == "English":
print("After if : {}".format(links))
half_dlink = links.find('td', {'class': 'download-cell'}).a['href']
download = base_url + half_dlink
print(download)