I want to download the hrefs of the 4 articles right above NEED TO KNOW on the following website:
but I cannot identify them uniquely with FindAll. The following approaches give me the articles, but also a bunch of others, that also fit those criteria.
trend_articles = soup1.findAll("a", {"class": "link"})
href= article.a["href"]
trend_articles = soup1.findAll("div", {"class": "content--secondary"})
href= article.a["href"]
Does someone have a suggestion, how I can get those 4, and only those 4 articles?
答案 0 :(得分:4)
这似乎对我有用:
from bs4 import BeautifulSoup
import requests
page = requests.get("http://www.marketwatch.com/").content
soup = BeautifulSoup(page, 'lxml')
header_secondare = soup.find('header', {'class': 'header--secondary'})
trend_articles = header_secondare.find_next_siblings('div', {'class': 'group group--list '})[0].findAll('a')
trend_articles = [article.contents[0] for article in trend_articles]
print(trend_articles)