Question

...
soup = BeautifulSoup(html, "lxml")
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

我终端上的输出。我在Mac OS 10.7.x上。我有Python 2.7.1，并按照this tutorial获取Beautiful Soup和lxml，它们都安装成功并使用单独的测试文件located here。在导致此错误的Python脚本中，我已包含以下行： from pageCrawler import comparePages 在pageCrawler文件中，我包含以下两行： from bs4 import BeautifulSoup from urllib2 import urlopen

任何有关解决问题是什么以及如何解决问题的帮助都将不胜感激。

Answer 1

我怀疑这与BS将用于读取HTML的解析器有关。他们document is here，但如果你像我一样（在OSX上），你可能会遇到需要做一些工作的事情：

您会注意到，在上面的BS4文档页面中，他们指出默认情况下BS4将使用Python内置的HTML解析器。假设您使用的是OSX，Apple捆绑的Python版本是2.7.2，对于字符格式化并不宽松。我遇到了同样的问题，所以我升级了我的Python版本来解决它。在virtualenv中执行此操作可以最大限度地减少对其他项目的干扰。

如果这听起来很痛苦，您可以切换到LXML解析器：

pip install lxml

然后尝试：

soup = BeautifulSoup(html, "lxml")

根据您的情况，这可能已经足够了。我觉得这很烦人，需要升级我的Python版本。使用virtualenv，you can migrate your packages非常容易。

Answer 2

对于安装了bs4的基本开箱即用的python，你可以用

处理你的xml

soup = BeautifulSoup(html, "html5lib")

如果您想使用 formatter ='xml'，那么您需要

pip3 install lxml

soup = BeautifulSoup(html, features="xml")

Answer 3

运行以下三个命令以确保已安装所有相关软件包：

pip install bs4
pip install html5lib
pip install lxml

然后根据需要重新启动Python IDE。

这应该处理与该问题有关的所有事情。

Answer 4

实际上其他工作中提到的三个选项。

1。

soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser

pip install lxml

soup_object= BeautifulSoup(markup,'lxml') # C dependent parser

pip install html5lib

soup_object= BeautifulSoup(markup,'html5lib') # C dependent parser

Answer 5

我首选内置python html解析器，没有安装没有依赖项汤= BeautifulSoup（s，“html.parser”）

Answer 6

我正在使用 Python 3.6 ，我在这篇文章中遇到了同样的原始错误。我运行命令后：

python3 -m pip install lxml

它解决了我的问题

Answer 7

在python环境中安装LXML解析器。

pip install lxml

您的问题将得到解决。您也可以使用内置的python软件包，其用法与以下相同：

soup = BeautifulSoup(s,  "html.parser")

注意：在Python3中，“ HTMLParser”模块已重命名为“ html.parser”

Answer 8

我遇到了同样的问题。我发现原因是我有一个稍微过时的python六包。

>>> import html5lib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module>
    from .html5parser import HTMLParser, parse, parseFragment
  File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module>
    from six import with_metaclass, viewkeys, PY3
ImportError: cannot import name viewkeys

升级您的六个包将解决问题：

sudo pip install six=1.10.0

Answer 9

虽然BeautifulSoup默认支持HTML解析器如果您想使用任何其他第三方Python解析器，您需要安装该外部解析器，如（lxml）。

soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser

但是如果你没有指定任何解析器作为参数，你将收到一条没有指定解析器的警告。

soup_object= BeautifulSoup(markup) #Warnning

要使用任何其他外部解析器，您需要安装它，然后需要指定它。像

pip install lxml

soup_object= BeautifulSoup(markup,'lxml') # C dependent parser

外部解析器具有c和python依赖性，这可能有一些优点和缺点。

Answer 10

使用html.parser而不是使用lxml，您可以使用这段代码：

soup = BeautifulSoup(html, 'html.parser')

Answer 11

空白参数将导致警告，提示最佳可用性。
汤= BeautifulSoup（html）

--------------- // UserWarning：未明确指定解析器，因此我正在为此系统使用最佳的HTML解析器（“ html5lib”）。这通常不是问题，但是，如果您在另一个系统或不同的虚拟环境中运行此代码，则它可能使用不同的解析器，并且行为不同。 ------- /

python --version Python 3.7.7

PyCharm 19.3.4 CE

Answer 12

由于您使用的解析器，错误即将来临。通常，如果您有HTML文件/代码，则需要使用html5lib（可在here中找到文档），如果您有XML文件/数据，则需要使用{{1} }（可以在here中找到文档）。您也可以将lxml用于HTML文件/代码，但有时会出现上述错误。因此，最好根据数据/文件的类型明智地选择软件包。您也可以使用内置模块lxml。但是，这有时有时也不起作用。

有关何时使用哪个软件包的更多详细信息，您可以查看详细信息here

Answer 13

在某些参考文献中，使用第二个而不是第一个：

soup_object= BeautifulSoup(markup,'html-parser')
soup_object= BeautifulSoup(markup,'html.parser')

bs4.FeatureNotFound：无法找到具有您请求的功能的树构建器：lxml。你需要安装解析器库吗？

13 个答案: