我的代码......
foo = fromstring(my_html)
它提出了这个警告......
UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "html.parser")
markup_type=markup_type))
我已经尝试将字符串'html.parser'
传递给它但是这不起作用,因为它给我一个错误,说明字符串不是可调用的,所以我尝试了html.parser
然后我查看了lxml模块看看我是否能找到另一个解析器,但却无法找到。我查看了python stdlib并看到在2.7中有一个名为HTMLParser
,所以我导入了它并输入了beautifulsoup=HTMLParser
,但这也没有用。
我应该传递给fromstring
的可调用者在哪里?
EDIT添加了尝试的解决方案:
from lxml.html.soupparser import fromstring
wiktionary_page = fromstring(wiktionary_page.read(), features="html.parser" )
和这个
from lxml.html.soupparser import BeautifulSoup
wiktionary_page = fromstring(wiktionary_page.read(), beautifulsoup=lambda s: BeautifulSoup(s, "html.parser"))
答案 0 :(得分:4)
您可以传递将设置解析器的 features 关键字。
tree = lxml.html.soupparser.fromstring("<p>foo</p>", features="html.parser" )
fromstring 中发生的事情 _parser 会被调用,但我认为行中有一个错误 bsargs ['features'] = ['html.parser '] ,它应该是bsargs['features'] = 'html.parser'
:
def _parse(source, beautifulsoup, makeelement, **bsargs):
if beautifulsoup is None:
beautifulsoup = BeautifulSoup
if hasattr(beautifulsoup, "HTML_ENTITIES"): # bs3
if 'convertEntities' not in bsargs:
bsargs['convertEntities'] = 'html'
if hasattr(beautifulsoup, "DEFAULT_BUILDER_FEATURES"): # bs4
if 'features' not in bsargs:
bsargs['features'] = ['html.parser'] # use Python html parser
tree = beautifulsoup(source, **bsargs)
root = _convert_tree(tree, makeelement)
# from ET: wrap the document in a html root element, if necessary
if len(root) == 1 and root[0].tag == "html":
return root[0]
root.tag = "html"
return root
你也可以使用lambda:
from lxml.html.soupparser import BeautifulSoup
import lxml.html.soupparser
tree = lxml.html.soupparser.fromstring("<p>foo</p>", beautifulsoup=lambda s: BeautifulSoup(s, "html.parser"))
两者都禁止任何警告:
In [13]: from lxml.html import soupparser
In [14]: tree = soupparser.fromstring("<p>foo</p>", features="html.parser" )
In [15]: from lxml.html.soupparser import BeautifulSoup
In [16]: import lxml.html.soupparser
In [17]: tree = lxml.html.soupparser.fromstring("<p>foo</p>", beautifulsoup=lambda s: BeautifulSoup(s, "html.parser"))