获得"无"和' NoneType对象......'使用BeautifulSoup4从网页获取文本时出错

时间:2016-01-12 21:11:54

标签: python python-2.7 webpage python-2.x bs4

我试图从BBC体育页面拉出主要标题(目前:&#34;温格预测&#39;活跃&#39; 1月和#34;)。该ID是&#39;标题&#39;它位于<h2><a>标记中。我使用的是Python。

from bs4 import BeautifulSoup
import urllib2
url = urllib2.urlopen("http://www.bbc.co.uk/sport/football/teams/arsenal")
soup=BeautifulSoup(url.read())
#Things I've tried
headline=soup.find('a', attrs={'id': 'lead-caption'})
print headline
#The above prints 'None'
headline1=soup.find('lead-caption').getText()
print headline1
#The above print "'NoneTpye' Object has no attirbute 'getText'
tag = soup.a
tag ['id'] = 'lead-caption'
type(tag)
print tag.string
#Error: NoneType object does not support item assignment

非常感谢任何帮助。谢谢:))

1 个答案:

答案 0 :(得分:2)

你的代码几乎是正确的,你正在寻找错误的元素,这就是你得到None的原因,它应该是div

headline=soup.find('div', attrs={'id': 'lead-caption'})
headline_text=headline.find('a').getText()
print headline_text

输出:

  

温格预测'积极'1月