Question

问题：我有一个html文件，它包含一些标记，现在我想找到一个带有类属性值为'targets'的标记（表），使用BeautifulSoup4.5.1，它在python3.5.2（Mac Sierra）中运行良好，但在python3.4.2（raspberry pi）中不起作用，我想找出原因。

以下是示例html文件（test.html）：

<!DOCTYPE html>
<html>
<head>
    <title>test</title>
</head>
<body>
<table class="maincontainer">
    <tbody>
        <tr>中文</tr>
        <tr>
            <td>
                <table class="main">
                    <tbody>
                        <tr>
                            <td class="embedded">
                                <td></td>
                                <table class="targets"></table>
                            </td>
                        </tr>
                    </tbody>
                </table>
            </td>
        </tr>
    </tbody>
</table>
</body>
</html>

以下是我在python文件中的写法：

str=''
with open('test.html','rt',encoding='utf-8') as f:
    str=f.read()
from bs4 import BeautifulSoup
soup=BeautifulSoup(str)
table=soup.select('table[class="targets"]')

所以有人可以告诉我以下这些问题：

如何选择功能？
为什么这在3.4.2中不起作用，但在3.5.2中工作？
有没有解决这个问题的答案？

Answer 1

这是因为3.5和3.4 Python环境中安装了不同的模块。当您未明确传递所需的解析器名称时：

soup = BeautifulSoup(str)

BeautifulSoup would pick the parser automatically从已安装的模块中选择一个。如果您安装了lxml，它会选择它，如果没有，它会选择html5lib - 如果没有安装，它会选择内置的html.parser：

如果您没有指定任何内容，您将获得最佳的HTML解析器安装。然后，Beautiful Soup将lxml的解析器列为最佳解析器 html5lib，然后是Python的内置解析器。

换句话说，您应该明确定义解析器以避免任何未来的相关问题。确定哪一个适用于您的特定情况并进行设置：

soup = BeautifulSoup(str, "html5lib")
# or soup = BeautifulSoup(str, "lxml")
# or soup = BeautifulSoup(str, "html.parser")

BeautifulSoup select函数在Python3.5.2和Python3.4.2之间的工作方式不同

1 个答案: