BS4 - 从你已经解析过的东西中获取信息

时间:2016-06-08 20:14:18

标签: beautifulsoup

嘿,这之前有点向我解释过,但是现在几乎在同一页面上同样的东西也没有用到... page ='http://www.imdb.com/genre/action/?ref_=gnr_mn_ac_mp'

    table = soup.find_all("table", {"class": "results"})
    for item in list(table):
        for info in item.contents[1::2]:
            info.a.extract()
            link = info.a['href']
            print(link)
            name = info.text.strip()
            print(name)

上面的代码尝试捕获变量信息中a标签中包含的每个电影的每个页面的链接...并且其中的文本具有每个电影的名称,但我得到所有文本。有没有办法得到这个名字?

提前谢谢你们!

2 个答案:

答案 0 :(得分:1)

只需要使用类标题 td 中的 anchor 标记中提取文本:

In [15]: from bs4 import BeautifulSoup
In [16]: import requests

In [17]: url = "http://www.imdb.com/genre/action/?ref_=gnr_mn_ac_mp"

In [18]: soup = BeautifulSoup(requests.get(url,"lxml").content)

In [19]: for td in soup.select("table.results td.title"):
   ....:         print(td.a.text)
   ....:     
X-Men: Apocalypse
Warcraft
Captain America: Civil War
The Do-Over
Teenage Mutant Ninja Turtles: Out of the Shadows
The Angry Birds Movie
The Nice Guys
Batman v Superman: Dawn of Justice
Suicide Squad
Deadpool
Gods of Egypt
Zootopia
13 Hours: The Secret Soldiers of Benghazi
Now You See Me 2
The Brothers Grimsby
Hardcore Henry
Monster Trucks
Independence Day: Resurgence
Star Trek Beyond
The Legend of Tarzan
Deepwater Horizon
X-Men: Days of Future Past
Star Wars: The Force Awakens
X-Men: First Class
The 5th Wave

您想要的所有数据几乎都在标题类的td中:

enter image description here

因此,如果您想要大纲,您还需要的是 span.outline 中的文字:

In [24]: for td in soup.select("table.results td.title"):
   ....:         print(td.a.text)
   ....:         print(td.select_one("span.outline").text)
   ....:     
X-Men: Apocalypse
With the emergence of the world's first mutant, Apocalypse, the X-Men must unite to defeat his extinction level plan.
Warcraft
The peaceful realm of Azeroth stands on the brink of war as its civilization faces a fearsome race of...
Captain America: Civil War
Political interference in the Avengers' activities causes a rift between former allies Captain America and Iron Man.
The Do-Over
Two down-on-their-luck guys decide to fake their own deaths and start over with new identities, only to find the people they're pretending to be are in even deeper trouble.
Teenage Mutant Ninja Turtles: Out of the Shadows
As Shredder joins forces with mad scientist Baxter Stockman and henchmen Bebop and Rocksteady to take over the world, the Turtles must confront an even greater nemesis: the notorious Krang.
The Angry Birds Movie
Find out why the birds are so angry. When an island populated by happy, flightless birds is visited by mysterious green piggies, it's up to three unlikely outcasts - Red, Chuck and Bomb - to figure out what the pigs are up to.
The Nice Guys
A mismatched pair of private eyes investigate the apparent suicide of a fading porn star in 1970s Los Angeles.
Batman v Superman: Dawn of Justice
Fearing that the actions of Superman are left unchecked, Batman takes on the Man of Steel, while the world wrestles with what kind of a hero it really needs.
Suicide Squad
A secret government agency recruits imprisoned supervillains to execute dangerous black ops missions in exchange for clemency.
Deadpool
A former Special Forces operative turned mercenary is subjected to a rogue experiment that leaves him with accelerated healing powers, adopting the alter ego Deadpool.
Gods of Egypt
Mortal hero Bek teams with the god Horus in an alliance against Set, the merciless god of darkness, who has usurped Egypt's throne, plunging the once peaceful and prosperous empire into chaos and conflict.
Zootopia
In a city of anthropomorphic animals, a rookie bunny cop and a cynical con artist fox must work together to uncover a conspiracy.
13 Hours: The Secret Soldiers of Benghazi
During an attack on a U.S. compound in Libya, a security team struggles to make sense out of the chaos.
Now You See Me 2
The Four Horsemen resurface and are forcibly recruited by a tech genius to pull off their most impossible heist yet.
The Brothers Grimsby
A new assignment forces a top spy to team up with his football hooligan brother.
Hardcore Henry
Henry is resurrected from death with no memory, and he must save his wife from a telekinetic warlord with a plan to bio-engineer soldiers.
Monster Trucks
Looking for any way to get away from the life and town he was born into, Tripp (Lucas Till), a high school senior...
Independence Day: Resurgence
Two decades after the first Independence Day invasion, Earth is faced with a new extra-Solar threat. But will mankind's new space defenses be enough?
Star Trek Beyond
The USS Enterprise crew explores the furthest reaches of uncharted space, where they encounter a mysterious new enemy who puts them and everything the Federation stands for to the test.
The Legend of Tarzan
Tarzan, having acclimated to life in London, is called back to his former home in the jungle to investigate the activities at a mining encampment.
Deepwater Horizon
A story set on the offshore drilling rig Deepwater Horizon, which exploded during April 2010 and created the worst oil spill in U.S. history.
X-Men: Days of Future Past
The X-Men send Wolverine to the past in a desperate effort to change history and prevent an event that results in doom for both humans and mutants.
Star Wars: The Force Awakens
Three decades after the defeat of the Galactic Empire, a new threat arises. The First Order attempts to rule the galaxy and only a ragtag group of heroes can stop them, along with the help of the Resistance.
X-Men: First Class
In 1962, the United States government enlists the help of Mutants with superhuman abilities to stop a malicious dictator who is determined to start World War III.
The 5th Wave
Four waves of increasingly deadly alien attacks have left most of Earth decimated. Cassie is on the run, desperately trying to save her younger brother.

对于运行时td.select_one("span.runtime").text等..

答案 1 :(得分:-1)

就像你通过

获得链接一样
info.a['href']

您还可以通过

获取电影的标题
info.a['title']

希望这是你正在寻找的东西!