我是Python的新手,我希望从链接中提取标题。到目前为止,我有以下情况,但已死胡同:
import requests
from bs4 import BeautifulSoup
page = requests.get("http://books.toscrape.com/")
soup = BeautifulSoup(page.content, 'html.parser')
books = soup.find("section")
book_list = books.find_all(class_="product_pod")
tonight = book_list[0]
for book in book_list:
price = book.find(class_="price_color").get_text()
title = book.find('a')
print (price)
print (title.contents[0])
答案 0 :(得分:3)
要从链接中提取标题,可以使用title
属性。
例如:
import requests
from bs4 import BeautifulSoup
page = requests.get("http://books.toscrape.com/")
soup = BeautifulSoup(page.content, 'html.parser')
for a in soup.select('h3 > a'):
print(a['title'])
打印:
A Light in the Attic
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History of Humankind
The Requiem Red
The Dirty Little Secrets of Getting Your Dream Job
The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull
The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics
The Black Maria
Starving Hearts (Triangular Trade Trilogy, #1)
Shakespeare's Sonnets
Set Me Free
Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)
Rip it Up and Start Again
Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991
Olio
Mesaerion: The Best Science Fiction Stories 1800-1849
Libertarianism for Beginners
It's Only the Himalayas
答案 1 :(得分:2)
您可以使用它:
import requests
from bs4 import BeautifulSoup
page = requests.get("http://books.toscrape.com/")
soup = BeautifulSoup(page.content, 'html.parser')
books = soup.find("section")
book_list = books.find_all(class_="product_pod")
tonight = book_list[0]
for book in book_list:
price = book.find(class_="price_color").get_text()
title = book.select_one('a img')['alt']
print (title)
输出:
A Light in the Attic
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History of Humankind
The Requiem Red...
答案 2 :(得分:1)
只需修改现有代码,您就可以使用包含示例中书名的替代文本。
print (title.contents[0].attrs["alt"])