使用python中的beautifulsoup刮取IMDB.com,但无法从电影链接中获取href

时间:2016-11-09 15:25:10

标签: python html beautifulsoup href imdb

我试图获取电影的href链接(例如:在IMDB上搜索钢铁侠),但我似乎无法得到它。我一直在"没有"当我运行代码但是如果我删除.get(' href'),代码将返回整行的html(包括我想要的链接)。我很感激任何帮助。谢谢!

from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin # For joining next page url with base url

search_terms = input("What movie do you want to know about?\n> ").split()

url = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + '+'.join(search_terms) + '&s=all'

def scrape_find_next_page(url):
    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")


    next_page = soup.find('td', 'result_text').get('href')


    return next_page


next_page_url = scrape_find_next_page(url)

1 个答案:

答案 0 :(得分:0)

您正试图从href获取td,该属性不存在。您需要获取包含a属性

href标记

next_page = soup.find('td', 'result_text').find('a').get('href')