我没有得到BeautifulSoup的语法,尤其是括号内的HTML解析器的目的。
BeautifulSoup(source_code, 'html.parser')
答案 0 :(得分:0)
这似乎是定义要用于解析source_code
的库的定义。检出选项in the docs及其比较方式。
据我了解,“ html.parser”将使用在here中找到的Python3 html模块。
更多关于解析器的内容:
答案 1 :(得分:0)
您可以签出BeautifulSoup source code来了解构造函数参数及其用法。这是BeautifulSoup类__init__.py
的代码:
def __init__(self, markup="", features=None, builder=None,
parse_only=None, from_encoding=None, exclude_encodings=None,
**kwargs):
...
if builder is None:
original_features = features
if isinstance(features, basestring):
features = [features]
if features is None or len(features) == 0:
features = self.DEFAULT_BUILDER_FEATURES
builder_class = builder_registry.lookup(*features)
if builder_class is None:
raise FeatureNotFound(
"Couldn't find a tree builder with the features you "
"requested: %s. Do you need to install a parser library?"
% ",".join(features))
builder = builder_class()
if not (original_features == builder.NAME or
original_features in builder.ALTERNATE_NAMES):
if builder.is_xml:
markup_type = "XML"
else:
markup_type = "HTML"
第一个参数是标记代码(例如HTML代码),第二个参数指定how to parse that markup,默认参数是内置HTML解析器,但可以覆盖它:
您可以通过指定以下一项来覆盖它:
- 您想解析哪种类型的标记。当前支持的是“ html”,“ xml”和“ html5”。
- 要使用的解析器库的名称。当前支持的选项是“ lxml”,“ html5lib”和“ html.parser”(Python的内置HTML解析器)。