Question

我使用Beautiful Soup 4来解析新闻网站以查找正文中包含的链接。我能够找到包含链接的所有段落，但每个链接都返回paragraph.get('href')返回的类型none。我使用的是Python 3.5.1。任何帮助都非常感谢。

from bs4 import BeautifulSoup
import urllib.request
import re

soup = BeautifulSoup("http://www.cnn.com/2016/11/18/opinions/how-do-you-deal-with-donald-trump-dantonio/index.html", "html.parser")

for paragraph in soup.find_all("div", class_="zn-body__paragraph"):
    print(paragraph.get('href'))

Answer 1

你真的想要这个吗？

for paragraph in soup.find_all("div", class_="zn-body__paragraph"):
    for a in paragraph("a"):
       print(a.get('href'))

请注意paragraph.get('href')尝试在找到的href标记中找到属性 <div>。由于没有这样的属性，它返回None。很可能你实际上必须找到<a>后代的所有标记<div>（这可以通过paragraph("a")来完成，这是paragraph.find_all("a")的快捷方式，然后是每个元素的快捷方式<a>查看他们的href属性。

如何使用Beautiful Soup查找指定类中的链接

1 个答案: