Question

我想知道执行bs.find('div')和bs.select_one('div')之间有什么区别。 find_all和select也是如此。

在性能方面是否有任何差异，或者在特定情况下是否更好地使用另一种。

Answer 1

select()和select_one()为您提供了一种使用CSS selectors浏览HTML树的不同方法，there is a lot going on to support CSS selector syntax under-the-hood具有丰富且方便的语法。但是，BeautifulSoup中的CSS选择器语法支持限制，但涵盖了大多数常见情况。

在性能方面，它实际上取决于要解析的HTML树以及在哪个元素上，它有多深，以及使用哪个选择器来定位它。另外，与find()进行比较的find_all() + select()替代方案也很重要。在像bs.find('div') vs bs.select_one('div')这样的简单案例中，我会说，find()通常应该更快，因为。

Answer 2

select_one通常比find快得多：

In [13]: req = requests.get("https://httpbin.org/")

In [14]: soup = BeautifulSoup(req.content, "html.parser")

In [15]:  soup.select_one("#DESCRIPTION")
Out[15]: <h2 id="DESCRIPTION">DESCRIPTION</h2>

In [16]:  soup.find("h2", id="DESCRIPTION")
Out[16]: <h2 id="DESCRIPTION">DESCRIPTION</h2>

In [17]: timeit  soup.find("h2", id="DESCRIPTION")
100 loops, best of 3: 5.27 ms per loop

In [18]: timeit  soup.select_one("#DESCRIPTION")
1000 loops, best of 3: 649 µs per loop

In [19]: timeit  soup.select_one("div")
10000 loops, best of 3: 61 µs per loop
In [20]: timeit  soup.find("div")
1000 loops, best of 3: 446 µs per loop

find 基本上与使用 find_all 设置限制为1相同，然后检查返回的列表是否为空，索引，如果它不为空如果是，则返回None。

def find(self, name=None, attrs={}, recursive=True, text=None,
         **kwargs):
    """Return only the first child of this Tag matching the given
    criteria."""
    r = None
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
    if l:
        r = l[0]
    return r

使用 select

select_one 执行类似操作：

def select_one(self, selector):
        """Perform a CSS selection operation on the current element."""
        value = self.select(selector, limit=1)
        if value:
            return value[0]
        return None

如果没有要处理的所有关键字参数，那么成本要低得多。

Beautifulsoup : Is there a difference between .find() and .select() - python 3.xx更多地介绍了差异。

Bs4 select_one vs find

2 个答案: