Question

我正在尝试删除新闻网站以进行学习，但是我遇到了情况

from bs4 import BeautifulSoup
from urllib.request import urlopen

req = urlopen('https://timesofindia.indiatimes.com/india/evidence-of-chidambaram-meeting-mukerjeas-destroyed-cbi/articleshow/71337533.cms')

page_html = req.read()

page_soup = BeautifulSoup(page_html,"html.parser")

section = page_soup.find('section',{'class':'_2suu5  clearfix id-r-component 
undefined undefined '})

print(section)

我已经尝试抓取另一个网站。代码工作正常。但这一次的错误是无法确定的。

Answer 1

我为您修复了它。我希望你学到了一些有用的东西。

import requests
from bs4 import BeautifulSoup
url = 'https://timesofindia.indiatimes.com/india/evidence-of-chidambaram-meeting-mukerjeas-destroyed-cbi/articleshow/71337533.cms'
response = requests.get(url)

bs = BeautifulSoup(response.text,"html.parser")

#this will work too
#section = bs.find_all('section', class_='_2suu5 clearfix id-r-component undefined undefined')

section = bs.find_all('section', attrs={'class': '_2suu5 clearfix id-r-component undefined undefined'})

#print(section)

网页中针对部分的Python抓取

1 个答案: