我主要使用python进行数据anlaysis和新的抓取。我正在尝试学习BeautifulSoup包。我有问题使以下代码工作。
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://pythonscraping.com/pages/warandpeace.html')
bsobj = BeautifulSoup(html)
name_list = bsobj.findAll('span',{'class':'green'})
我得到一个空列表。
很明显问题来自第4行。我不知道为什么。一切都是标准的。我不知道出了什么问题。
bsobj.prettify()
返回''
但是当我做html.read()时,我可以看到所有的html代码都没问题。 以下答案无法解决问题。问题显然来自line4。如果我使用bsobj.findAll()或bsobj.find_all()并不重要。它们是等价的,正如我所提到的,bsobj.prettify()返回''。
答案 0 :(得分:0)
我认为该行应为bsobj = BeautifulSoup(html.read())
答案 1 :(得分:0)
findall错了......
bsobj.find_all('span',{'class':'green'})
返回
[<span class="green">Anna
Pavlovna Scherer</span>, <span class="green">Empress Marya
Fedorovna</span>, <span class="green">Prince Vasili Kuragin</span>, <span class="green">Anna Pavlovna</span>, <span class="green">St. Petersburg</span>, <span class="green">the prince</span>, <span class="green">Anna Pavlovna</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the prince</span>, <span class="green">the prince</span>, <span class="green">the prince</span>, <span class="green">Prince Vasili</span>, <span class="green">Anna Pavlovna</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the prince</span>, <span class="green">Wintzingerode</span>, <span class="green">King of Prussia</span>, <span class="green">le Vicomte de Mortemart</span>, <span class="green">Montmorencys</span>, <span class="green">Rohans</span>, <span class="green">Abbe Morio</span>, <span class="green">the Emperor</span>, <span class="green">the prince</span>, <span class="green">Prince Vasili</span>, <span class="green">Dowager Empress Marya Fedorovna</span>, <span class="green">the baron</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the Empress</span>, <span class="green">the Empress</span>, <span class="green">Anna Pavlovna's</span>, <span class="green">Her Majesty</span>, <span class="green">Baron
Funke</span>, <span class="green">The prince</span>, <span class="green">Anna
Pavlovna</span>, <span class="green">the Empress</span>, <span class="green">The prince</span>, <span class="green">Anatole</span>, <span class="green">the prince</span>, <span class="green">The prince</span>, <span class="green">Anna
Pavlovna</span>, <span class="green">Anna Pavlovna</span>]