Question

我目前在使用BeautifulSoup时遇到了一些麻烦，并且想知道是否有解决办法，因为我不确定如何搜索这个问题。

我目前正在使用带有Python的BeautifulSoup模块从电子邮件中解析数据。您可以执行以下操作：

>>> soup.title.string
>>> 'The string found withing the Title Tags'

但是，目前的问题是我想在<from>标签之间提取信息。

因此，在输入以下内容时：

>>> soup.from.string

Python将from识别为内置函数，因此我无法使其工作。有没有办法让Python从模块的功能中识别，而不是它自己的内置函数？

Answer 1

在这种情况下，您应该使用soup.find(tagName)。例如，from代码：

soup.find('from').string

如果您的HTML文件中包含更多from个标记，那么soup.find_all()将是更好的选择。当您搜索from等时，它会返回所有from代码的列表：

soup.find_all('from')[2].string    # get the string in the third `from` tag

我们还有soup.find_next()和soup.find_parents()。要了解它们的用法，请检查我链接的文档。

以下是关于它们的简单演示：

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("""
... <html>
...     <head>
...     </head>
...     <body>
...         <from>The first `from` tag</from>
...         <from>The second `from` tag</from>
...         <from>The third `from` tag</from>
...     </body>
... </html>""", "html.parser")

>>> soup.find('from').string
'The first `from` tag'

>>> soup.find_all('from')
[<from>The first `from` tag</from>,
 <from>The second `from` tag</from>,
 <from>The third `from` tag</from>]

>>> soup.find_all('from')[2].string
'The third `from` tag'
>>>

蟒蛇; BeautifulSoup和内置功能

1 个答案: