我正在尝试从HTML页面解析特定文章的标题和链接,
我的代码如下,
def get_bcci_articles():
bcci_article_link = "http://www.bcci.tv/news/2018/news"
r = requests.get(bcci_article_link)
bcci_article_html = r.text
soup = BeautifulSoup(bcci_article_html, "html.parser")
# print(soup.prettify())
bcci_items = soup.find_all("div",
{"class": "newsCol"})
bcci_article_dict = {}
for div in bcci_items:
a = div.find('a')['href']
b = 'https://www.bcci.tv'
c = urljoin(b,a)
#print(c)
bcci_article_dict[div.find('p')['class.title']] = c
return bcci_article_dict
这是html内容
<div class="newsCol">
<a href="/news/2018/news/17091/confident-india-u19-eye-fourth-world-cup-title">
<p class="title">Confident India U19 eye fourth World Cup title</p>
</a>
</div>
我想提取链接和标题,我可以使用div.find('a')['href']提取链接,我怎么能提取class =“title”所以我得到,自信印度U19眼第四届世界杯标题。 我试过做[div.find('p')['class.title']],我收到错误我知道这不是一种正确的调用方式,我该怎么解决这个问题?