Python BeautifulSoup解析不起作用

时间:2017-09-10 06:58:15

标签: python beautifulsoup

我主要使用python进行数据anlaysis和新的抓取。我正在尝试学习BeautifulSoup包。我有问题使以下代码工作。

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://pythonscraping.com/pages/warandpeace.html')
bsobj = BeautifulSoup(html)
name_list = bsobj.findAll('span',{'class':'green'})

我得到一个空列表。

很明显问题来自第4行。我不知道为什么。一切都是标准的。我不知道出了什么问题。

bsobj.prettify() 

返回''

但是当我做html.read()时,我可以看到所有的html代码都没问题。 以下答案无法解决问题。问题显然来自line4。如果我使用bsobj.findAll()或bsobj.find_all()并不重要。它们是等价的,正如我所提到的,bsobj.prettify()返回''。

2 个答案:

答案 0 :(得分:0)

我认为该行应为bsobj = BeautifulSoup(html.read())

答案 1 :(得分:0)

findall错了......

bsobj.find_all('span',{'class':'green'})

返回

[<span class="green">Anna
 Pavlovna Scherer</span>, <span class="green">Empress Marya
 Fedorovna</span>, <span class="green">Prince Vasili Kuragin</span>, <span class="green">Anna Pavlovna</span>, <span class="green">St. Petersburg</span>, <span class="green">the prince</span>, <span class="green">Anna Pavlovna</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the prince</span>, <span class="green">the prince</span>, <span class="green">the prince</span>, <span class="green">Prince Vasili</span>, <span class="green">Anna Pavlovna</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the prince</span>, <span class="green">Wintzingerode</span>, <span class="green">King of Prussia</span>, <span class="green">le Vicomte de Mortemart</span>, <span class="green">Montmorencys</span>, <span class="green">Rohans</span>, <span class="green">Abbe Morio</span>, <span class="green">the Emperor</span>, <span class="green">the prince</span>, <span class="green">Prince Vasili</span>, <span class="green">Dowager Empress Marya Fedorovna</span>, <span class="green">the baron</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the Empress</span>, <span class="green">the Empress</span>, <span class="green">Anna Pavlovna's</span>, <span class="green">Her Majesty</span>, <span class="green">Baron
 Funke</span>, <span class="green">The prince</span>, <span class="green">Anna
 Pavlovna</span>, <span class="green">the Empress</span>, <span class="green">The prince</span>, <span class="green">Anatole</span>, <span class="green">the prince</span>, <span class="green">The prince</span>, <span class="green">Anna
 Pavlovna</span>, <span class="green">Anna Pavlovna</span>]