Question

我使用find_all（）方法对BeautifulSoup遇到了麻烦。我正在尝试在所有p标签之间获取文本，但是它仅返回列表的第一个元素。实际上列表只有一项。为什么find_all（）方法仅返回一项？

这是我要提取的代码的一部分：

<div class="post-content">
 <p>If you’re not familiar with Deep Image, it’s an amazing tool which allows you to increase the size of an image and upgrade its quality at the same time.</p>

 <p>You can find it, and use for free <a href="https://deep-image.ai/">HERE</a></p>

 <p><em>The goal of this blog post is to focus on the main changes and showcase the results of DI 2.0 algorithms.</em></p>

 <p>As we all know a picture is worth a thousand words. So we will let the enhanced pictures speak for themselves. All pictures you can see below were processed using Deep Image algorithms.</p>

 <h2 id="what-has-changed">What has changed</h2>

 <p>Here are all the main improvements added to Deep Image 2.0:</p>
</div>

这是我的代码：

from bs4 import BeautifulSoup
import requests

source = requests.get('https://teonite.com/blog/deep-image-2-showcasing-results/').text
soup = BeautifulSoup(source, 'html.parser')

for article in soup.find_all(class_='post-content'):
    print(article.p.text)

感谢您的帮助！

Answer 1

您正在搜索类别为post-content的所有标签。尽管只有一个这样的元素，但find_all返回的列表只有一个条目。因此，for循环中只有一个迭代，并且在此迭代中仅打印第一个p标记的文本。

尝试一下：

from bs4 import BeautifulSoup
import requests

html = '''
<div class="post-content">
 <p>If you’re not familiar with Deep Image, it’s an amazing tool which allows you to increase the size of an image and upgrade its quality at the same time.</p>

 <p>You can find it, and use for free <a href="https://deep-image.ai/">HERE</a></p>

 <p><em>The goal of this blog post is to focus on the main changes and showcase the results of DI 2.0 algorithms.</em></p>

 <p>As we all know a picture is worth a thousand words. So we will let the enhanced pictures speak for themselves. All pictures you can see below were processed using Deep Image algorithms.</p>

 <h2 id="what-has-changed">What has changed</h2>

 <p>Here are all the main improvements added to Deep Image 2.0:</p>
</div>
'''

soup = BeautifulSoup(html, 'html.parser')
div = soup.find(class_='post-content')
for p in div.find_all('p'):
    print(p.text)

您将在p标记中获得所有文本的期望输出，因为我们现在搜索类为post-content的元素，然后在该元素中搜索所有p标记。

Answer 2

命令print(article.p.text)将仅找到并打印第一个<p>标签。要获取整篇文章的文本，可以使用例如get_text()：

from bs4 import BeautifulSoup
import requests

source = requests.get('https://teonite.com/blog/deep-image-2-showcasing-results/')
soup = BeautifulSoup(source.content, 'html.parser')

for article in soup.find_all(class_='post-content'):
    print(article.get_text(strip=True, separator='\n'))

打印：

If you’re not familiar with Deep Image, it’s an amazing tool which allows you to increase the size of an image and upgrade its quality at the same time.
You can find it, and use for free
HERE
The goal of this blog post is to focus on the main changes and showcase the results of DI 2.0 algorithms.
As we all know a picture is worth a thousand words. So we will let the enhanced pictures speak for themselves. All pictures you can see below were processed using Deep Image algorithms.
What has changed
Here are all the main improvements added to Deep Image 2.0:
You are now able to use a new algorithm to magnify the image two-fold and four-fold. It is based on Generative Adversarial Networks.
The quality of the algorithm has been improved - there are less artefacts and even smoother edges in the enhanced images.
We have delivered a new, more reliable asynchronous queue architecture and task processing, based on microservices.
You can now enjoy a fully redesigned web application
A few examples
Please keep in mind that Deep Image was trained to do a very particular job - it will increase the size of the picture as well as improve its quality once it’s enlarged. It will not improve the quality of the image you have resized before.
Check out those awesome results!
The improvements are clearly visible.
Think about all those old photos you will be able to enhance!
You are the main reason we’re working on those cool projects, so we’d love to get your feedback.
Contact us
and let us know what you think!

注意：

要正确进行字符串解码，请使用result.content而不是result.text。

find_all（）仅返回列表的第一项

2 个答案: