Question

我没有得到BeautifulSoup的语法，尤其是括号内的HTML解析器的目的。

BeautifulSoup(source_code, 'html.parser')

Answer 1

这似乎是定义要用于解析source_code的库的定义。检出选项in the docs及其比较方式。

据我了解，“ html.parser”将使用在here中找到的Python3 html模块。

更多关于解析器的内容：

Answer 2

您可以签出BeautifulSoup source code来了解构造函数参数及其用法。这是BeautifulSoup类__init__.py的代码：

def __init__(self, markup="", features=None, builder=None,
             parse_only=None, from_encoding=None, exclude_encodings=None,
             **kwargs):
    ...
    if builder is None:
        original_features = features
        if isinstance(features, basestring):
            features = [features]
        if features is None or len(features) == 0:
            features = self.DEFAULT_BUILDER_FEATURES
        builder_class = builder_registry.lookup(*features)
        if builder_class is None:
            raise FeatureNotFound(
                "Couldn't find a tree builder with the features you "
                "requested: %s. Do you need to install a parser library?"
                % ",".join(features))
        builder = builder_class()
        if not (original_features == builder.NAME or
                original_features in builder.ALTERNATE_NAMES):
            if builder.is_xml:
                markup_type = "XML"
            else:
                markup_type = "HTML"

第一个参数是标记代码（例如HTML代码），第二个参数指定how to parse that markup，默认参数是内置HTML解析器，但可以覆盖它：

您可以通过指定以下一项来覆盖它：


您想解析哪种类型的标记。当前支持的是“ html”，“ xml”和“ html5”。

要使用的解析器库的名称。当前支持的选项是“ lxml”，“ html5lib”和“ html.parser”（Python的内置HTML解析器）。

做BeautifulSoup（source_code，'html.parser'）时“ html.parser”是什么意思？

2 个答案: