Question

我是Python的新手。以下是Python中的一些编码行，用于打印http://www.nytimes.com/上的所有文章标题。

import requests
from bs4 import BeautifulSoup
base_url = 'http://www.nytimes.com'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)
   for story_heading in soup.find_all(class_="story-heading"):        
    if story_heading.a:
            print(story_heading.a.text.replace("\n", " ").strip())
        else:
            print(story_heading.contents[0].strip())

.a和.text的含义是什么？

非常感谢。

Answer 1

首先，让我们看看单独story_heading打印的内容是什么：

>>> story_heading
<h2 class="story-heading"><a href="https://www.nytimes.com/real-estate/mortgage-calculator">Mortgage Calculator</a></h2>

要提取 a代码，我们会使用story_heading.a访问它：

>>> story_heading.a
<a href="https://www.nytimes.com/real-estate/mortgage-calculator">Mortgage Calculator</a>

要仅在标记内部获取文本，而不是它的属性，我们使用.text：

>>> story_heading.a.text
'Mortgage Calculator'

Answer 2

下面， .a为您提供第一个锚标记 .text为您提供标签内的文字

打印所有文章标题

2 个答案: