Question

以下代码仅返回空括号。我看过这篇文章Why does bs4 return tags and then an empty list to this find_all() method?，但这是不同的，因为我没有使用find_all（）而是.select（）。请注意，我将“ nth-child”更改为“ nth-of-type”，以避免发生错误。

    import bs4
    import requests
    res = requests.get('http://www.sharkresearchcommittee.com/pacific_coast_shark_news.htm')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
soup.select('body > div > div:nth-of-type(2) > center > table > tbody > tr:nth-of-type(1) >td:nth-of-type(2) > p:nth-of-type(8) > strong:nth-of-type(1) > font')

输出为[]

Answer 1

如果您告诉我们您要匹配的内容，可能会更有意义。因为您没有匹配项的原因显然是因为您的选择没有任何匹配项。

根据您选择的其他内容，我猜测您目前在错误的div中：

body> div> div：nth-of-type（2）

该DIV包含以下文本：

此网站上包含的资料作为公共服务共享并促进鲨鱼研究委员会的科学目标。本网站上的所有文本和图像均为的专有属性。鲨鱼研究委员会。...

我猜您想从那里进入div，这可能是您想要的选择器：

soup.select('body > div > div > center > table > tr > td:nth-of-type(2) > p:nth-of-type(8) > strong > font')

以上内容将使您：

[<font size="4">Ventura </font>, <font size="4">  </font>]

没有深入研究它，但是我敢肯定，有比您用来获得相同东西的选择更好的选择选择。但是以上可能会为您带来帮助。

完整代码：

import bs4
import requests
res = requests.get('http://www.sharkresearchcommittee.com/pacific_coast_shark_news.htm')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
print(repr(soup.select('body > div > div > center > table > tr > td:nth-of-type(2) > p:nth-of-type(8) > strong > font')))

运行：

markh@mob:~/stackoverflow/51256960$ python bs1.py 
[<font size="4">Ventura </font>, <font size="4">  </font>]

Answer 2

这意味着找不到任何匹配项。

可能没有这样的标签，但是如果您确定有，请尝试使用html5lib或lxml parsers。

我希望这会有所帮助。

Answer 3

为避免使用.select时出现不匹配错误，您可以执行以下操作：

打开Inspect元素或开发人员工具

Chrome浏览器 Ctrl + Shift + I或F12或右键单击检查元素
操作 Ctrl + Shift + C或右键单击检查元素
Safari （{li> ）Ctrl + Shift + I
Firefoz Ctrl + Shift + I或F12

请注意，对于MAC，您应该使用Ctrl + Shift + I

打开开发者工具后，检查您要定位的元素。

通常该元素将具有class或id关键字（希望如此）

获取ID或类，如下所示。

要获取 id ，请确保您的代码如下所示：soup.select('#CompanyInfo') 要获取课程，请确保您的代码如下soup.select('.CompanyInfo')

注意：比起您只能使用soup.select('.CompanyInfo')[0].getText()打印文本不要忘记添加索引，因为select返回列表。

祝您编程愉快！

Answer 4

tbody是我的问题。

我通过逐个添加选择器级别找到了它，并检测到生成一个空列表。

为什么使用.select（）的BS4（来自BeautifulSoup）仅返回[]？

4 个答案: