如何在IMDB页面上查找总值

时间:2019-02-14 13:42:50

标签: python beautifulsoup

使用Beautifulsoup,我用

在页面上找到投票值。
vote = container.find('span', attrs = {'name':'nv'})['data-value']

我如何找到Gross的值,因为它的跨度名称相同?

页面为Released between 2018-01-01 and 2018-12-31

2 个答案:

答案 0 :(得分:1)

可能使用findAll并选择第二项以获取Gross字段的值。例如:

elements = container.findAll('span', attrs = {'name':'nv'})
votes = elements[0]['data-value']
gross = elements[1]['data-value']

答案 1 :(得分:1)

做事的方式不是很Python,但是我有点喜欢。

from bs4 import BeautifulSoup
import requests

def get_imdb_data(url):
    data = requests.get(url)
    soup = BeautifulSoup(data.text)
    divs = soup.findAll('div', {'class':'lister-item'})
    movies = []
    for div in divs:    
        movie = {}
        movie['name'] = div.find('h3').find('a').text
        spans = votes = gross = None
        try:
            spans = div.findAll('span', {'name':'nv'})
            try:
                movie['votes'] = spans[0]['data-value']
            except:
                pass
            try:
                movie['gross'] = spans[1]['data-value']
            except:
                pass
        except:
            pass
        movies.append(movie)
    return movies

url = 'https://www.imdb.com/search/title?release_date=2018&sort=num_votes,desc&page=1'
data = get_imdb_data(url)
print(data)