我一直在尝试通过网络抓取新闻文章的标题,但是在以下代码中遇到“索引错误”。我仅在代码的最后一行遇到问题。
import requests
from bs4 import BeautifulSoup
URL= 'https://www.ndtv.com/coronavirus?pfrom=home-mainnavgation'
r1 = requests.get(URL)
coverpage = r1.content
soup1 = BeautifulSoup(coverpage, 'html5lib')
coverpage_news = soup1.find_all('h3', class_='item-title')
coverpage_news[4].get_text()
这是错误:
IndexError Traceback (most recent call last)
<ipython-input-10-f7f1f6fab81c> in <module>
6 soup1 = BeautifulSoup(coverpage, 'html5lib')
7 coverpage_news = soup1.find_all('h3', class_='item-title')
----> 8 coverpage_news[4].get_text()
IndexError: list index out of range
答案 0 :(得分:1)
使用soup1.select()
搜索与CSS选择器匹配的嵌套元素:
coverpage_news = soup1.select("h3 a.item-title")
这将找到一个a
元素的后代,其中class="item-title"
元素是h3
元素的后代。
答案 1 :(得分:0)
尝试更改:
coverpage_news = soup1.find_all('h3', class_='item-title')
到
coverpage_news = soup1.find_all('h3', class_='list-txt')
答案 2 :(得分:0)
稍微更改@Barmar的有用答案即可显示所有标题:
coverpage_news = soup1.select("h3 a.item-title")
for link in coverpage_news:
print(link.text)
输出:
US Covid Infections Cross 9 Million, Record 1-Day Spike Of 94,000 Cases
Johnson & Johnson Plans To Test COVID-19 Vaccine On Youngsters Soon
Global Stock Markets Decline As Coronavirus Infection Rate Weighs
Cristiano Ronaldo Recovers From Coronavirus
Reliance's July-September Profit Falls 15% As Covid Slams Oil Business
"Likely To Know By December If We'll Have Covid Vaccine": Top US Expert
With No Local Case In A Record 200 Days, This Country Is World's Envy
Delhi Blames Pollution For Covid Spike At High-Level Health Ministry Meet
Delhi Covid Cases Above 5,000 For 3rd Straight Day, Spike In ICU Patients
2 Million Indians Returned From Abroad Under Vande Bharat Mission: Centre
Existing Lockdown Restrictions Extended In Maharashtra Till November 30
Can TB Vaccine Protect Elderly From Covid?
Is The Covid-19 Situation Worsening In Delhi?
What's The Truth Behind India's Falling Covid Numbers?
"Slight Laxity Can Lead To Spike": AIIMS Director As India Sees Drop In Covid Cases