Question

我试图打印出“密歇根日报”最常阅读文章中所有文章标题的列表，如opinion page所示，并用空白行标出每篇文章标题

这就是我现在所写的内容，但是class= "field-content"不够窄，只能抓住最常读框中的标题。

import requests
from bs4 import BeautifulSoup

base_url = 'http://www.michigandaily.com/section/opinion' 
r = requests.get(base_url) 
soup = BeautifulSoup(r.text, "html5lib") 
for story_heading in soup.find_all(class_="field-content"):  
    if story_heading.a:  
        print(story_heading.a.text.replace("\n", " ").strip()) 
    # else:  
    #     print(story_heading.contents[0].strip())

非常感谢任何和所有帮助，并提前感谢您：）

Answer 1

文章有三个部分。每个都是div，其中“view-content”类包含span（带有“field-content”类），其中嵌入了该部分的文章链接。第三个“view-content”div包含“Most Read”文章。以下内容应通过扫描第三个（“最常读”）div中的“字段内容”来检索这些文章：

mostReadSection = soup.findAll('div', {'class':"view-content"})[2] # get the most read section

storyHeadings = mostReadSection.findAll('span', {'class':"field-content"})

for story_heading in storyHeadings:
    if story_heading.a:
        print story_heading.a.text.replace("\n", " ").strip()

如何使用beautifulsoup打印出所有文章标题的列表

1 个答案: