从Python中的链接中提取标题(漂亮的汤)

时间:2020-05-30 18:38:01

标签: python beautifulsoup

我是Python的新手,我希望从链接中提取标题。到目前为止,我有以下情况,但已死胡同:

import requests
from bs4 import BeautifulSoup
page = requests.get("http://books.toscrape.com/")
soup = BeautifulSoup(page.content, 'html.parser')
books = soup.find("section")
book_list = books.find_all(class_="product_pod")
tonight = book_list[0]

for book in book_list:
    price = book.find(class_="price_color").get_text()
    title = book.find('a')
    print (price)
    print (title.contents[0])

3 个答案:

答案 0 :(得分:3)

要从链接中提取标题,可以使用title属性。

例如:

import requests
from bs4 import BeautifulSoup
page = requests.get("http://books.toscrape.com/")
soup = BeautifulSoup(page.content, 'html.parser')

for a in soup.select('h3 > a'):
    print(a['title'])

打印:

A Light in the Attic
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History of Humankind
The Requiem Red
The Dirty Little Secrets of Getting Your Dream Job
The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull
The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics
The Black Maria
Starving Hearts (Triangular Trade Trilogy, #1)
Shakespeare's Sonnets
Set Me Free
Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)
Rip it Up and Start Again
Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991
Olio
Mesaerion: The Best Science Fiction Stories 1800-1849
Libertarianism for Beginners
It's Only the Himalayas

答案 1 :(得分:2)

您可以使用它:

import requests
from bs4 import BeautifulSoup
page = requests.get("http://books.toscrape.com/")
soup = BeautifulSoup(page.content, 'html.parser')
books = soup.find("section")
book_list = books.find_all(class_="product_pod")
tonight = book_list[0]

for book in book_list:
    price = book.find(class_="price_color").get_text()
    title = book.select_one('a img')['alt']
    print (title)

输出:

A Light in the Attic
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History of Humankind
The Requiem Red...

答案 2 :(得分:1)

只需修改现有代码,您就可以使用包含示例中书名的替代文本。

print (title.contents[0].attrs["alt"])