我需要废弃一个网站以获取一些信息,如电影的标题和相关链接。我的代码运行正常,但它停在网站的第一行。这是我的代码,请提前感谢您的帮助,如果这不是一个聪明的问题,我很抱歉,但我是新手。
import requests
from bs4 import BeautifulSoup
URL= 'http://www.simplyscripts.com/genre/horror-scripts.html'
def scarica_pagina(URL):
page = requests.get(URL)
html = page.text
soup = BeautifulSoup(html, 'lxml') l
films = soup.find_all("div",{"id": "movie_wide"})
for film in films:
link = film.find('p').find("a").attrs['href']
title = film.find('p').find("a").text.strip('>')
print (link)
print(title)
答案 0 :(得分:0)
尝试以下方式。我稍微修改了你的脚本以达到目的并让它看起来更好。如果您遇到任何进一步的问题,请告诉我们:
import requests
from bs4 import BeautifulSoup
URL = 'http://www.simplyscripts.com/genre/horror-scripts.html'
def scarica_pagina(link):
page = requests.get(link)
soup = BeautifulSoup(page.text, 'lxml')
for film in soup.find(id="movie_wide").find_all("p"):
link = film.find("a")['href']
title = film.find("a").text
print (link,title)
if __name__ == '__main__':
scarica_pagina(URL)