我正在尝试删除新闻网站以进行学习,但是我遇到了情况
from bs4 import BeautifulSoup
from urllib.request import urlopen
req = urlopen('https://timesofindia.indiatimes.com/india/evidence-of-chidambaram-meeting-mukerjeas-destroyed-cbi/articleshow/71337533.cms')
page_html = req.read()
page_soup = BeautifulSoup(page_html,"html.parser")
section = page_soup.find('section',{'class':'_2suu5 clearfix id-r-component
undefined undefined '})
print(section)
我已经尝试抓取另一个网站。代码工作正常。但这一次的错误是无法确定的。
答案 0 :(得分:1)
我为您修复了它。我希望你学到了一些有用的东西。
import requests
from bs4 import BeautifulSoup
url = 'https://timesofindia.indiatimes.com/india/evidence-of-chidambaram-meeting-mukerjeas-destroyed-cbi/articleshow/71337533.cms'
response = requests.get(url)
bs = BeautifulSoup(response.text,"html.parser")
#this will work too
#section = bs.find_all('section', class_='_2suu5 clearfix id-r-component undefined undefined')
section = bs.find_all('section', attrs={'class': '_2suu5 clearfix id-r-component undefined undefined'})
#print(section)