在Python中从HTML提取标签值下的标签

时间:2019-06-30 13:10:18

标签: python html text beautifulsoup tags

<div class="book-cover-image">
<img alt="NOT IN MY BACKYARD – Solid Waste Mgmt in Indian Cities" class="img-responsive" src="https://cdn.downtoearth.org.in/library/medium/2016-05-23/0.42611000_1463993925_book-cover.jpg" title="NOT IN MY BACKYARD – Solid Waste Mgmt in Indian Cities"/>
</div>

我需要从所有此类div标签中提取此标题值。什么是执行此操作的最佳方法。请提出建议。

我正在尝试获取this page上提到的所有书籍的标题。

到目前为止,我已经尝试过:

import requests 
from bs4 import BeautifulSoup as bs


url1 ="https://www.downtoearth.org.in/books"
page1 = requests.get(url1, verify=False)

#print(page1.content)

soup1= bs(page1.content, 'html.parser')
class_names = soup1.find_all('div',{'class':'book-cover-image'} )

for class_name in class_names:
    title_text = class_name.text
    print(class_name)
    print(title_text)

2 个答案:

答案 0 :(得分:2)

要获取书籍封面的所有title属性,可以使用CSS选择器.book-cover-image img[title](选择所有<img>属性为title的标签,这些标签位于{类{1}}):

book-cover-image

打印:

import requests
from bs4 import BeautifulSoup

url = 'https://www.downtoearth.org.in/books'
soup = BeautifulSoup(requests.get(url).text, 'lxml')

for i, img in enumerate(soup.select('.book-cover-image img[title]'), 1):
    print('{:>4}\t{}'.format(i, img['title']))

答案 1 :(得分:1)

您可以像这样使用xpath

import requests
from lxml import html

url1 ="https://www.downtoearth.org.in/books"
res = requests.get(url1, verify=False)
tree = html.fromstring(res.text)
d = tree.xpath("//div[@class='book-cover-image']//img/@title")
for title in d:
    print(title)

输出

State of India’s Environment 2019: In Figures (eBook)
Victim Africa (eBook)
Frames of change - Heartening tales that define new India
STATE OF INDIA’S ENVIRONMENT 2019
State of India’s Environment In Figures 2018 (eBook)
Getting to know about environment
CLIMATE CHANGE NOW - The Story of Carbon Colonisation
Climate change - For the young and curious
Conflicts of Interest: My Journey through India’s Green Movement
Body Burden: Lifestyle Diseases
STATE OF INDIA’S ENVIRONMENT 2018
DROUGHT BUT WHY? How India can fight the scourge by abandoning drought relief
SOE 2017 (Print version) and SOE 2017 in Figures (Digital version) combo offer
State of India's Environment 2017 In Figures (eBook)
Environment Reader for Universities
Not in My Backyard  (Book & DVD combo offer)
The Crow, Honey Hunter and the Kitchen Garden
BIOSCOPE OF PIU & POM
SOE 2017 and Food book combo offer
FIRST FOOD: Culture of Taste
Annual State Of India’s Environment - SOE 2017
An 8-million-year-old mysterious date with monsoon  (e-book) 
Why I Should be Tolerant
NOT IN MY BACKYARD – Solid Waste Mgmt in Indian Cities