我尝试在nytimes.com上打印所有标题。我使用了请求和beautifulsoup模块。但是最后我得到了空括号。返回结果为[]。我该如何解决这个问题?
import requests
from bs4 import BeautifulSoup
url = "https://www.nytimes.com/"
r = requests.get(url)
text = r.text
soup = BeautifulSoup(text, "html.parser")
title = soup.find_all("span", "balanceHeadline")
print(title)
答案 0 :(得分:0)
我假设您正在尝试检索nytimes的标题。进行title = soup.find_all("span", {'class':'balancedHeadline'})
不会为您带来结果。使用元素选择器找到的<span>
标签经常会引起误解。您要做的就是查看页面的源代码,找到包裹在标题周围的标签。
对于nytimes来说,这有点棘手,因为标题被包裹在<script>
标签中,里面有很多垃圾。因此,您可以做的是先“清理”它,然后通过将字符串转换为python字典对象来反序列化该字符串。
import requests
from bs4 import BeautifulSoup
import json
url = "https://www.nytimes.com/"
r = requests.get(url)
r_html = r.text
soup = BeautifulSoup(r_html, "html.parser")
scripts = soup.find_all('script')
for script in scripts:
if 'preloadedData' in script.text:
jsonStr = script.text
jsonStr = jsonStr.split('=', 1)[1].strip() # remove "window.__preloadedData = "
jsonStr = jsonStr.rsplit(';', 1)[0] # remove trailing ;
jsonStr = json.loads(jsonStr)
for key,value in jsonStr['initialState'].items():
try:
if value['promotionalHeadline'] != "":
print(value['promotionalHeadline'])
except:
continue
输出
Jeffrey Epstein Autopsy Results Conclude He Hanged Himself
Trump and Netanyahu Put Bipartisan Support for Israel at Risk
Congresswoman Rejects Israel’s Offer of a West Bank Visit
In Tlaib’s Ancestral Village, a Grandmother Weathers a Global Political Storm
Cathay Chief’s Resignation Shows China’s Power Over Hong Kong Unrest
Trump Administration Approves Fighter Jet Sales to Taiwan
Peace Road Map for Afghanistan Will Let Taliban Negotiate Women’s Rights
Debate Flares Over Afghanistan as Trump Considers Troop Withdrawal
In El Paso, Hundreds Show Up to Mourn a Woman They Didn’t Know
Is Slavery’s Legacy in the Power Dynamics of Sports?
Listen: ‘Modern Love’ Podcast
‘The Interpreter’
If You Think Trump Is Helping Israel, You’re a Fool
First They Came for the Black Feminists
How Women Can Escape the Likability Trap
With Trump as President, the World Is Spiraling Into Chaos
To Understand Hong Kong, Don’t Think About Tiananmen
The Abrupt End of My Big-Girl Summer
From Trump Boom to Trump Gloom
What Are Trump and Netanyahu Afraid Of?
King Bibi Bows Before a Tweet
Ebola Could Be Eradicated — But Only if the World Works Together
The Online Mob Came for Me. What Happened to the Reckoning?
A German TV Star Takes On Bullies
Why Is Hollywood So Scared of Climate Change?
Solving Medical Mysteries With Your Help: Now on Netflix
答案 1 :(得分:-1)
title = soup.find_all("span", "balanceHeadline")
替换为
title = soup.find_all("span", {'class':'balanceHeadline'})