HTML解析w / BS4:无法找到树构建器......&#html解析器'

时间:2018-04-13 02:40:05

标签: python-3.x

在我的pydev控制台收到错误后,我无法理解如何继续。

控制台返回以下内容:

  b'<!DOCTYPE html>\n<html>\n    <head>\n        <title>A simple example page</title>\n    </head>\n    <body>\n        <p>Here is some simple content for this page.</p>\n    </body>\n</html>'
Traceback (most recent call last):
  File "C:\Users\RainShadow\eclipse-workspace\test0\test2.py", line 7, in <module>
    soup = BeautifulSoup(page.content, 'html parser')
  File "C:\Users\RainShadow\Desktop\PythonLibs\BeautifulSoup4\bs4\__init__.py", line 165, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html parser. Do you need to install a parser library?

我运行以生成上述控制台输出的代码如下:

import requests 

page = requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
print(page.content)

from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html parser')

print(soup.prettify())

我的问题是哪里可以最好地下载使用功能&#h; html解析器的树构建器&#39;?

1 个答案:

答案 0 :(得分:2)

初始化BS时尝试此操作:

soup = BeautifulSoup(page.content, 'html.parser')

注意句号(.)而不是空格。 html.parser开箱即用,并且应该将页面解析到您需要的级别。有关详细信息,请参阅this documentation