我正在尝试使用这个简短的脚本从YIFY页面提取数据(因为他们的网站缺少一些基本的过滤器选项),但是虽然它与其他页面完美配合,但它没有显示这个数据。实际上,它在无限循环中运行。
import requests
from bs4 import BeautifulSoup
def praca_crawler(max_pages):
page = 1
while page <= max_pages:
url = "https://www.yify-torrent.org/search/1080p/t-" + str(page) + "/"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for link in soup.findAll('a', {'class': 'mv'}):
title = link.string
link_url = link.get('href')
print(title)
print(url + link_url)
page += 1
praca_crawler(4)
好像这里有两个问题。 while循环(尽管“page + = 1”没有增加页码,并且还有用于数据的过滤器。 想获得移动标题(没有任何HTML或CSS标签)和链接。
答案 0 :(得分:0)
import requests
from bs4 import BeautifulSoup
def praca_crawler(max_pages):
page = 1
while page <= max_pages:
url = "https://www.yify-torrent.org/search/1080p/t-" + str(page) + "/"
source_code = requests.get(url)
source_code.raise_for_status()
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for div in soup.findAll('div', {'class': 'mv'}):
title = div.a.string
link_url = div.a.get('href')
print(title)
print(url + link_url)
page += 1
praca_crawler(4)
出:
Kommissar Maigret: Ein toter Mann (2016) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50486/download-kommissar-maigret-ein-toter-mann-2016-1080p-mp4-yify-torrent.html
Love Me (2014) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50485/download-love-me-2014-1080p-mp4-yify-torrent.html
Blood Car (2007) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50484/download-blood-car-2007-1080p-mp4-yify-torrent.html
SS Experiment Love Camp (1976) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50481/download-ss-experiment-love-camp-1976-1080p-mp4-yify-torrent.html
Paper Tiger (1975) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50479/download-paper-tiger-1975-1080p-mp4-yify-torrent.html
The Soft Skin (1964) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50477/download-the-soft-skin-1964-1080p-mp4-yify-torrent.html
问题:
page+=1
放在for
循环之外,您应该在遍历页面时增加数字,而不是每次打印标题时都增加。 <div class="mv"><h3><a href="/movie/50486/download-kommissar-maigret-ein-toter-mann-2016-1080p-mp4-yify-torrent.html" target="_blank" title="Kommissar Maigret: Ein toter Mann (2016) 1080p">Kommissar Maigret: Ein toter Mann (2016) 1080p</a></h3
a
标记