Question

我正在尝试使用Python和Beautifulsoup从sfglobe网站获取此页面：http://sfglobe.com/2015/04/28/stirring-pictures-from-the-riots-in-baltimore。这是代码：

import urllib2
from bs4 import BeautifulSoup 

url = 'http://sfglobe.com/2015/04/28/stirring-pictures-from-the-riots-in-baltimore' 
req = urllib2.urlopen(url) 
html = req.read() 
soup = BeautifulSoup(html) 
desc = soup.find('span', class_='articletext intro')

有人可以帮我解决这个问题吗？

Answer 1

在问题标题中，我假设你唯一想要的是文章的描述，可以在HTML false中的<meta>标记中找到。

你走在正确的轨道上，但我不确定你为什么这么做：

<head>

无论如何，我使用desc = soup.find('span', class_='articletext intro')（参见http://stackoverflow.com/questions/2018026/should-i-use-urllib-or-urllib2-or-requests）而非requests

提出了一些建议。

urllib2

如果这不是您要找的内容，请澄清，以便我可以尝试为您提供更多帮助。

编辑：经过一些澄清后，我拼凑了您最初使用的原因 import requests from bs4 import BeautifulSoup url = 'http://sfglobe.com/2015/04/28/stirring-pictures-from-the-riots-in-baltim\ ore' req = requests.get(url) html = req.text soup = BeautifulSoup(html) tag = soup.find(attrs={'name':'description'}) # find meta tag w/ description desc = tag['value'] # get value of attribute 'value' print desc。

也许这就是你要找的东西：

desc = soup.find('span', class_='articletext intro')

如何使用python抓取sfglobe的描述

1 个答案: