Python从网站获取特定数据

时间:2020-07-26 12:42:47

标签: python html beautifulsoup request

我是python的新手,我正在使用界面。我应该从imdb网站上观看前250部电影。

 $(document).ready(function () {
                //Give a time for initialization of combos
                setTimeout(function () {
                    var kelle = $('.select-wrapper');// $('.select-wrapper');
                    $.each(kelle, function (i, t) {
                        t.addEventListener('click', e => e.stopPropagation())
                    });
                }, 500)
            });

,此代码输出如下: 6.辛德勒的名单(1993) 7.指环王:国王的归来(2003) 8.低俗小说(1994) 但是对于我的项目,我只想取电影名称而不是年份和排名。我应该如何更改代码?

2 个答案:

答案 0 :(得分:0)

一个原始的解决方案可能是(考虑到您的字符串是digits + . + name_of_movie + (YEAR)只是

a=["6. Schindler's List(1993)", "7. The Lord of the Rings: The Return of the King(2003)", "8. Pulp Fiction(1994)"]
just_names=[]
for name in a:
    i=0
    while True:
        if name[i]=='.':
            just_names.append(name[i+2:-6]) # To delete the space after the point
            break
        i+=1

答案 1 :(得分:0)

锚标签中仅包含电影的名称。因此,为每个td

选择锚标记文本
import requests
from bs4 import BeautifulSoup

url="https://www.imdb.com/chart/top/"
response=requests.get(url)
html_content=response.content
soup=BeautifulSoup(html_content,"html.parser")

movie_name = soup.find_all("td",{"class":"titleColumn"})

for i in movie_name:
    print(i.find("a").get_text(strip=True))

输出:

The Shawshank Redemption
The Godfather
The Godfather: Part II
The Dark Knight
12 Angry Men
Schindler's List
The Lord of the Rings: The Return of the King
Pulp Fiction
Il buono, il brutto, il cattivo
The Lord of the Rings: The Fellowship of the Ring
Fight Club
Forrest Gump
Inception
Star Wars: Episode V - The Empire Strikes Back
The Lord of the Rings: The Two Towers
The Matrix
Goodfellas
One Flew Over the Cuckoo's Nest
Shichinin no samurai
Se7en
La vita è bella
Cidade de Deus
The Silence of the Lambs
Hamilton
It's a Wonderful Life
Star Wars
Saving Private Ryan
Sen to Chihiro no kamikakushi
Gisaengchung
The Green Mile
Interstellar
Léon
The Usual Suspects
Seppuku
The Lion King
Back to the Future
The Pianist
Terminator 2: Judgment Day
American History X
Modern Times
Psycho
Gladiator
City Lights
The Departed
The Intouchables
Whiplash
The Prestige
...
...
..